Wednesday, November 28, 2018

Working with Excel worksheets in Python using openpyxl.

Excel is spreadsheet software developed by Microsoft and there are free alternatives which work with the .xlsx extension.

These alternatives are OpenOffice Calc and LibreOffice Calc. They can both work with Linux, OS X and Windows as well.

Can I use Python with Excel?

Python has libraries that you can use to work with Excel files. These libraries include:

  • openpyxl
  • xlsxwriter
  • xlrd
  • xlwt
  • xlutils

This tutorial will use openpyxl library to write python programs to read and modify Excel worksheet files.

If you are using OpenOffice Calc and LibreOffice Calc, openpyxl can still work with these spreadsheet applications.

Working with Excel using Python openpyxl module

How to install openpyxl module

Openpyxl is a third-party Python module, therefore, you have to install it.

Pip is a tool in Python used to install third-party Python modules. Get pip on your computer and on your command prompt run,

pip install openpyxl

I should warn you,

Always have a duplicate of any Excel file you are working with to prevent mistakenly overwriting and losing vital information.

Embody the good practice of working with a duplicate file.

Creating and saving an Excel file.

The code snippet below creates a workbook and assigns a value to cell A1, A2, B1 and B2 and then saves it.

import openpyxl

filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\new_file.xlsx"

#create a workbook
wb = openpyxl.Workbook()

#Get a sheet to work with
ws = wb.active

#Assign Values to cell
ws["A1"] = "Name"
ws["B1"] = "Age"
ws["A2"] = "Frank"
ws["B2"] = 28

#Saving workbook
wb.save(filepath)

Here is the output of this program.

create and save worksheet using openpyxl python module

Okay, it is time I explain what I did.

First step:

import openpyxl module so you call use  its functions and class to work with an excel file.

Second step:

A full path of a directory I want to save my xlsx file was provided with the name of the file at the end.

For windows users, use backslash(\) when writing your path.

If you are an OS X or Linux user, use forward slash(/).

Don’t forget to use double slashes because each slash has to be escaped by another slash.

Third step:

The Workbook class creates a workbook with at least one sheet.

Fourth step:

The next step is to get a worksheet to work with.

ws = wb.active

The code above gets the first worksheet in a workbook.

Fifth step:

Values are directly assigned to cells. An alternative way to do this is to use the code snippets below.

#Alternative way to assign Values to a cell

ws.cell(row=1,column=1,value="Name")

ws.cell(row=2,column=1,value="Frank")

ws.cell(row=1,column=2,value="Age")

ws.cell(row=2,column=2,value=28)

There is another way of assigning values to cell using the append() method. I will show you how to use it in the next section.

Last step:

Save your workbook.

Assigning values to cells using append() method

In this section,

We will learn how to:

  • open an existing workbook
  • Create a new sheet and
  • Append values to cells

Note:

We will work with the same file name new_file.xlsx or file path.

Just search for this file in the download folder you downloaded this tutorial. Download link is at the end of the tutorial.

Copy the file path and replace it with the one in the code snippet below.

The new_file.xlsx will be modified as we run the code below.

import openpyxl

filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\new_file.xlsx"

#Open workbook
wb = openpyxl.load_workbook(filepath)

#create new sheets
sheet_one = wb.create_sheet("Student details")
sheet_two = wb.create_sheet("Weather Report")
sheet_three = wb.create_sheet("Lecturers details")

#Get Student details sheet
ws = wb["Student details"]

#Creating cell values
data_frame = [
    ["Name","Gender","CGPA"],
    ["Godson","Male",3.7],
    ["Esther","Female",3.0],
    ["Frank","Male",3.5],
    ["Sandra","Female",3.8]
]

#Appending cell values to cell
for data in data_frame:
    ws.append(data)

#Saving workbook
wb.save(filepath)

Here is the output of the code snippet above.

creating new worksheet with openpyxl

I am sure you don’t have any problem understanding the code snippet.

If you do, please let me know via the comment section.

Note:

You can determine the arrangement or position of your worksheet.

The line of code below will make the Lecturers details worksheet to be the first worksheet.

sheet_three = wb.create_sheet("Lecturers details",index=0)

Accessing values of cells in a worksheet

Once you have a Worksheet object, you can access a cell object i.e the value of the cell by its name.

Note:

If you downloaded this tutorial, you will load modified_new_file.xlsx. Get the file path and make the necessary changes in the code snippets below.

If you have challenges getting the file path, follow the steps below.

Step 1:

Open a blank workbook in Microsoft Excel and click on file.

The picture below illustrates this step.

Getting the file path of a workbook in excel

Step 2: 

The following steps are illustrated in the screenshot below.

  • Click open
  • Click on Recent Workbboks
  • Find the file you want its path.(modified_new_file.xlsx)
  • Click on the file
  • Click on Copy path to clipboard

A tip is to open modified_new_file.xlsx and exiting so you can see it on your recent workbooks.

These steps apply for Excel on windows, I don’t know if it is applicable to other spreadsheet applications but I little google search won’t hurt you.

copying an excel workbook path to clipboard on windows

Enter the following code on Python interactive shell and replace the file path

>>> from openpyxl import load_workbook
>>> filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\modified_new_file.xlsx"
>>> wb = load_workbook(filepath)
>>> ws = wb["Student details"]
>>> ws["A1"].value
'Name'
>>> print(ws["A2"].value)
Godson
>>> print(ws["B2"].value)
Male
>>> print(ws["B1"].value)
Gender

The next section will work with “my_project.xlsx” located in the folder you downloaded for this tutorial.

Remember to get the file path and make the necessary changes in the code snippets below.

Here is a screenshot of the workbook you will be working with.

screenshot of an Excel workbook

Getting all the values in column A

The code snippet below gets all the value of cells in column A.

from openpyxl import load_workbook

#file path of workbook
file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx"

#open workbook
wb = load_workbook(file_path)

#Get the first worksheet in workbook
ws = wb.active

#Get the maximum number of rows
max_row = ws.max_row

#Get the maximum number of rows
min_row = ws.min_row

for i in range(min_row,max_row+1):
    print(ws.cell(row=i, column=1).value)

Iterating by columns

The iter_cols() method returns cells from the worksheet as columns.

The code snippet below iterates data column by column.

from openpyxl import load_workbook

#file path of workbook
file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx"

#open workbook
wb = load_workbook(file_path)

#Get the first worksheet in workbook
ws = wb.active

for col in ws.iter_cols(min_row=1,max_col=4,max_row=11):
    for cell in col:
        print(cell.value,end=" ")
    print()

The first for loop provides the boundary for the iteration.

Below is the output of the code snippet above.

iterating the column of a worksheet with iter_cols

The output is a bit scattered but hope you noticed that for every line, all the values in a particular column were printed out.

Iterating by rows

The iter_rows() method return cells from the worksheet as rows.

The code snippet below iterates data row by row.

from openpyxl import load_workbook

#file path of workbook
file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx"

#open workbook
wb = load_workbook(file_path)

#Get the first worksheet in workbook
ws = wb.active

for col in ws.iter_rows(min_row=1,max_col=4,max_row=11):
    for cell in col:
        print(cell.value,end=" ")
    print()

Below is the output of the code snippet below.

openpyxl iter_rows

The output is a bit scattered but hope you noticed that for every line, all the values in a particular row were printed out.

Charts

With the openpyxl library, you can create various types of chart including bar charts, line charts,area charts, bubble charts, scatter charts, and pie charts for a set of data.

Warning:

Openpyxl currently supports chart creation within a worksheet only. Charts in existing workbooks will be lost.

In the next section,

we will still work with the “my_project.xlsx” in the folder you downloaded for this tutorial. Get the file path of the file and make the necessary changes in the code snippets in this section.

Before we can plot a X Y scatter chart for our my_project.xlsx file, we need two columns for the x and y axis of the chart.

We need column A(Temperature data) and column D(heat flow data).

The code snippets below copies the value of cells in column A and column D to another worksheet in the same file.

from openpyxl import load_workbook

#file path of workbook
file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx"

#open workbook
wb = load_workbook(file_path)

#Get the first worksheet in workbook
ws = wb.active

#Get the maximum number of rows
max_row = ws.max_row

#put the value of cells in a particular column in a list 
temp = [ws.cell(column=1,row=i).value for i in range(1,max_row+1) ]
heat_flow = [ws.cell(column=4,row=i).value for i in range(1,max_row+1) ]

#Create your data frame
dataframe = list(zip(temp,heat_flow))

#create new sheet
new_sheet = wb.create_sheet("Chart Data")
 
#Append your data in a new worksheet
for data in dataframe:
    new_sheet.append(data)

wb.save(file_path)

Below is the output of the code snippet below.

coping data from one sheet to another using openpyxl python module

I know some people may be confused with some lines of code so I will go through them.

The below code snippet uses the concept of list comprehension to provide a list of the value of cells in a column.

#put the value of cells in a particular column in a list
temp = [ws.cell(column=1,row=i).value for i in range(1,max_row+1) ]
heat_flow = [ws.cell(column=4,row=i).value for i in range(1,max_row+1) ]

If you added the lines of code below to our program,

print(temp)
print(heat_flow)

You will have an output below:

['Temp ', 25, 60, 96, 132, 169, 205, 241, 278, 312, 348, 384]
['Heat Flow ', 0, 386437, 804919, 1254842, 1735594, 2246626, 2787547, 3358262, 3959242, 4592128, 5261384]

The code snippet below uses the zip() function

#Create your data frame
dataframe = list(zip(temp,heat_flow))

The zip function returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.

Observe the code snippets below that was entered into Python interactive shell.

>>> temp = ['Temp ', 25, 60, 96, 132, 169, 205, 241, 278, 312, 348, 384]
>>> heat = ['Heat Flow ', 0, 386437, 804919, 1254842, 1735594, 2246626, 2787547, 3358262, 3959242, 4592128, 5261384]
>>> dataframe = list(zip(temp,heat))
>>> print(dataframe)
[('Temp ', 'Heat Flow '), (25, 0), (60, 386437), (96, 804919), (132, 1254842), (169, 1735594), (205, 2246626), (241, 2787547), (278, 3358262), (312, 3959242), (348, 4592128), (384, 5261384)]

As the dataframe is printed a list of tuples is the output.

The zip() function picks the first element in temp and heat list and arranges them in a tuple and continues to the end of the list.

Note:

The iterator stops when the shortest input iterable is exhausted.

Now, we are ready to create our X Y scatter chart.

Scatter chart

This section will work with “modified_my_project.xlsx” located in the download file of this tutorial.

Don’t forget to copy the file path of the file and make the necessary changes in the code snippet below.

The code snippet below creates our chart.

from openpyxl import load_workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)

file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\modified_my_project.xlsx"

wb = load_workbook(file_path)

ws = wb["Chart Data"]

chart = ScatterChart()
chart.title = "Heat duty profile of crude flow"
chart.style = 13
chart.x_axis.title = 'Temperature of Crude'
chart.y_axis.title = 'Heat Duty'

xvalues = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
yvalues = Reference(ws,min_col=2,min_row=1,max_row=ws.max_row)
series = Series(yvalues,xvalues,title_from_data=True)
chart.series.append(series)
ws.add_chart(chart,"D4")
wb.save(file_path)

Below is the output of the code snippet above.

openpyxl scatter chart

Refer to openpyxl official documentation to learn how to create other types of chart.

Task 1

Sending Emails to address stored in an excel file

Let say you are running a small business and you want to remind clients of the amount they owe you through email.

The email can just be like this:

Hello John Mark,

Please, we want to remind you that you owe us $200

-xyz company

You may like to be able to change the name and amount in each message you want to send.

The client’s details (name, amount owning, email address) are in an Excel workbook.

Doing this task manually, can be annoying,

But a simple Python scripts can do the job for you in seconds once the script has been written.

Warning:

  • Email services have a daily limit for sending out emails. Don’t try this with 10000 emails so you don’t get blocked. I think 10-20 emails are good.
  • I would advise you create a new email account if you have not tried to send emails using Python preferably with gmail.
  • For gmail users, turn on your lesser secure apps setting so you can login to your email using Python

Don’t be scared of these warning, I am just protecting you if you are trying this out for the first time.

The excel file you will be working in this section is email_list.xlsx located in the folder you downloaded for this tutorial.

Please, I didn’t provide any real person email so change the emails to yours or friend’s email. Let them know you are using them email.

Below is the screenshot of the workbook you will be working with.

customer details stored in a workbook to send email

The below code snippet will do the task that was mention above.

import openpyxl
import smtplib

file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\Email marketing\\deleteme.xlsx"
wb = openpyxl.load_workbook(file_path)

#Get sheet where client's details are in
sheet = wb["customer"]

names = [sheet.cell(row=i,column=1).value for i in range(2,6)]
email = [sheet.cell(row=i,column=2).value for i in range(2,6)]
amount = [sheet.cell(row=i,column=3).value for i in range(2,6)]

#message to send to clients
message = "Subject: Reminder of your credit\n\nHello {0},\n Please, we want to remind you that you owe us ${1}\n -XYZ company"

host = "smtp.gmail.com"
port = 587
username = "Enter your email"
password = "Enter your email password"
s = smtplib.SMTP(host,port)
s.ehlo()
s.starttls()
s.login(username,password)
i = 0

while i < len(email):
    for name in names:
        if name==None:
            pass
        else:
            s.sendmail(username,email[i],message.format(name,amount[i]))  
        i+=1
        
s.quit()

Note:

When I created the list emails, name and account, I purposely exceed the rows that had data.

This code below prevents my program from crashing when a cell does not have a value.

if name==None:
    pass

If you take time to go through the code, you will be able to understand.

Here is a guide on how to send email using Python.

Below is a screenshot of one of the email sent.

email sent using python smptlib and openpyxl modules

Task 2:

Let’s say you have an email list of 1 million people from different countries.

The first column has the email addresses of the people.

The second column has the location of the people.

For a good reason, you are interested in getting the email addresses of people located in the United States.

Doing this may take sleepless nights.

The code snippets will easily do the task for you.

Sorry, I can’t provide an email list to demonstrate this, if you have one, you can work with it.

from openpyxl import load_workbook

filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\email_list.xlsx"

wb = load_workbook(filepath)

ws = wb["Email"]

max_row = ws.max_row

for i in range(1,max_row+1):
    if ws.cell(row=i,column=2).value == "United States":
        print(ws.cell(row=i,column=1).value)

You have come to the end of this tutorial.

Tell your friends on Twitter that you now know how to work with Excel using PythonClick To Tweet

Please don’t forget to:

Please if you have any questions or contributions, please leave it in the comment section.

Thanks for reading. If you enjoyed this tutorial, hit the share button. It would mean a lot to me and it helps other people to see the tutorial.

Learn how to convert HTML to XLS or PDF files using Python.

The post Working with Excel worksheets in Python using openpyxl. appeared first on Cool Python Codes.



from Cool Python Codes
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...