Excel is spreadsheet software developed by Microsoft and there are free alternatives which work with the .xlsx extension.
These alternatives are OpenOffice Calc and LibreOffice Calc. They can both work with Linux, OS X and Windows as well.
Can I use Python with Excel?
Python has libraries that you can use to work with Excel files. These libraries include:
- openpyxl
- xlsxwriter
- xlrd
- xlwt
- xlutils
This tutorial will use openpyxl library to write python programs to read and modify Excel worksheet files.
If you are using OpenOffice Calc and LibreOffice Calc, openpyxl can still work with these spreadsheet applications.
How to install openpyxl module
Openpyxl is a third-party Python module, therefore, you have to install it.
Pip is a tool in Python used to install third-party Python modules. Get pip on your computer and on your command prompt run,
pip install openpyxl
I should warn you,
Always have a duplicate of any Excel file you are working with to prevent mistakenly overwriting and losing vital information.
Embody the good practice of working with a duplicate file.
Creating and saving an Excel file.
The code snippet below creates a workbook and assigns a value to cell A1, A2, B1 and B2 and then saves it.
import openpyxl filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\new_file.xlsx" #create a workbook wb = openpyxl.Workbook() #Get a sheet to work with ws = wb.active #Assign Values to cell ws["A1"] = "Name" ws["B1"] = "Age" ws["A2"] = "Frank" ws["B2"] = 28 #Saving workbook wb.save(filepath)
Here is the output of this program.
Okay, it is time I explain what I did.
First step:
import openpyxl module so you call use its functions and class to work with an excel file.
Second step:
A full path of a directory I want to save my xlsx file was provided with the name of the file at the end.
For windows users, use backslash(\) when writing your path.
If you are an OS X or Linux user, use forward slash(/).
Don’t forget to use double slashes because each slash has to be escaped by another slash.
Third step:
The Workbook class creates a workbook with at least one sheet.
Fourth step:
The next step is to get a worksheet to work with.
ws = wb.active
The code above gets the first worksheet in a workbook.
Fifth step:
Values are directly assigned to cells. An alternative way to do this is to use the code snippets below.
#Alternative way to assign Values to a cell ws.cell(row=1,column=1,value="Name") ws.cell(row=2,column=1,value="Frank") ws.cell(row=1,column=2,value="Age") ws.cell(row=2,column=2,value=28)
There is another way of assigning values to cell using the append() method. I will show you how to use it in the next section.
Last step:
Save your workbook.
Assigning values to cells using append() method
In this section,
We will learn how to:
- open an existing workbook
- Create a new sheet and
- Append values to cells
Note:
We will work with the same file name new_file.xlsx or file path.
Just search for this file in the download folder you downloaded this tutorial. Download link is at the end of the tutorial.
Copy the file path and replace it with the one in the code snippet below.
The new_file.xlsx will be modified as we run the code below.
import openpyxl filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\new_file.xlsx" #Open workbook wb = openpyxl.load_workbook(filepath) #create new sheets sheet_one = wb.create_sheet("Student details") sheet_two = wb.create_sheet("Weather Report") sheet_three = wb.create_sheet("Lecturers details") #Get Student details sheet ws = wb["Student details"] #Creating cell values data_frame = [ ["Name","Gender","CGPA"], ["Godson","Male",3.7], ["Esther","Female",3.0], ["Frank","Male",3.5], ["Sandra","Female",3.8] ] #Appending cell values to cell for data in data_frame: ws.append(data) #Saving workbook wb.save(filepath)
Here is the output of the code snippet above.
I am sure you don’t have any problem understanding the code snippet.
If you do, please let me know via the comment section.
Note:
You can determine the arrangement or position of your worksheet.
The line of code below will make the Lecturers details worksheet to be the first worksheet.
sheet_three = wb.create_sheet("Lecturers details",index=0)
Accessing values of cells in a worksheet
Once you have a Worksheet object, you can access a cell object i.e the value of the cell by its name.
Note:
If you downloaded this tutorial, you will load modified_new_file.xlsx. Get the file path and make the necessary changes in the code snippets below.
If you have challenges getting the file path, follow the steps below.
Step 1:
Open a blank workbook in Microsoft Excel and click on file.
The picture below illustrates this step.
Step 2:
The following steps are illustrated in the screenshot below.
- Click open
- Click on Recent Workbboks
- Find the file you want its path.(modified_new_file.xlsx)
- Click on the file
- Click on Copy path to clipboard
A tip is to open modified_new_file.xlsx and exiting so you can see it on your recent workbooks.
These steps apply for Excel on windows, I don’t know if it is applicable to other spreadsheet applications but I little google search won’t hurt you.
Enter the following code on Python interactive shell and replace the file path
>>> from openpyxl import load_workbook >>> filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\modified_new_file.xlsx" >>> wb = load_workbook(filepath) >>> ws = wb["Student details"] >>> ws["A1"].value 'Name' >>> print(ws["A2"].value) Godson >>> print(ws["B2"].value) Male >>> print(ws["B1"].value) Gender
The next section will work with “my_project.xlsx” located in the folder you downloaded for this tutorial.
Remember to get the file path and make the necessary changes in the code snippets below.
Here is a screenshot of the workbook you will be working with.
Getting all the values in column A
The code snippet below gets all the value of cells in column A.
from openpyxl import load_workbook #file path of workbook file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx" #open workbook wb = load_workbook(file_path) #Get the first worksheet in workbook ws = wb.active #Get the maximum number of rows max_row = ws.max_row #Get the maximum number of rows min_row = ws.min_row for i in range(min_row,max_row+1): print(ws.cell(row=i, column=1).value)
Iterating by columns
The iter_cols() method returns cells from the worksheet as columns.
The code snippet below iterates data column by column.
from openpyxl import load_workbook #file path of workbook file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx" #open workbook wb = load_workbook(file_path) #Get the first worksheet in workbook ws = wb.active for col in ws.iter_cols(min_row=1,max_col=4,max_row=11): for cell in col: print(cell.value,end=" ") print()
The first for loop provides the boundary for the iteration.
Below is the output of the code snippet above.
The output is a bit scattered but hope you noticed that for every line, all the values in a particular column were printed out.
Iterating by rows
The iter_rows() method return cells from the worksheet as rows.
The code snippet below iterates data row by row.
from openpyxl import load_workbook #file path of workbook file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx" #open workbook wb = load_workbook(file_path) #Get the first worksheet in workbook ws = wb.active for col in ws.iter_rows(min_row=1,max_col=4,max_row=11): for cell in col: print(cell.value,end=" ") print()
Below is the output of the code snippet below.
The output is a bit scattered but hope you noticed that for every line, all the values in a particular row were printed out.
Charts
With the openpyxl library, you can create various types of chart including bar charts, line charts,area charts, bubble charts, scatter charts, and pie charts for a set of data.
Warning:
Openpyxl currently supports chart creation within a worksheet only. Charts in existing workbooks will be lost.
In the next section,
we will still work with the “my_project.xlsx” in the folder you downloaded for this tutorial. Get the file path of the file and make the necessary changes in the code snippets in this section.
Before we can plot a X Y scatter chart for our my_project.xlsx file, we need two columns for the x and y axis of the chart.
We need column A(Temperature data) and column D(heat flow data).
The code snippets below copies the value of cells in column A and column D to another worksheet in the same file.
from openpyxl import load_workbook #file path of workbook file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\my_project.xlsx" #open workbook wb = load_workbook(file_path) #Get the first worksheet in workbook ws = wb.active #Get the maximum number of rows max_row = ws.max_row #put the value of cells in a particular column in a list temp = [ws.cell(column=1,row=i).value for i in range(1,max_row+1) ] heat_flow = [ws.cell(column=4,row=i).value for i in range(1,max_row+1) ] #Create your data frame dataframe = list(zip(temp,heat_flow)) #create new sheet new_sheet = wb.create_sheet("Chart Data") #Append your data in a new worksheet for data in dataframe: new_sheet.append(data) wb.save(file_path)
Below is the output of the code snippet below.
I know some people may be confused with some lines of code so I will go through them.
The below code snippet uses the concept of list comprehension to provide a list of the value of cells in a column.
#put the value of cells in a particular column in a list temp = [ws.cell(column=1,row=i).value for i in range(1,max_row+1) ] heat_flow = [ws.cell(column=4,row=i).value for i in range(1,max_row+1) ]
If you added the lines of code below to our program,
print(temp) print(heat_flow)
You will have an output below:
['Temp ', 25, 60, 96, 132, 169, 205, 241, 278, 312, 348, 384] ['Heat Flow ', 0, 386437, 804919, 1254842, 1735594, 2246626, 2787547, 3358262, 3959242, 4592128, 5261384]
The code snippet below uses the zip() function
#Create your data frame dataframe = list(zip(temp,heat_flow))
The zip function returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.
Observe the code snippets below that was entered into Python interactive shell.
>>> temp = ['Temp ', 25, 60, 96, 132, 169, 205, 241, 278, 312, 348, 384] >>> heat = ['Heat Flow ', 0, 386437, 804919, 1254842, 1735594, 2246626, 2787547, 3358262, 3959242, 4592128, 5261384] >>> dataframe = list(zip(temp,heat)) >>> print(dataframe) [('Temp ', 'Heat Flow '), (25, 0), (60, 386437), (96, 804919), (132, 1254842), (169, 1735594), (205, 2246626), (241, 2787547), (278, 3358262), (312, 3959242), (348, 4592128), (384, 5261384)]
As the dataframe is printed a list of tuples is the output.
The zip() function picks the first element in temp and heat list and arranges them in a tuple and continues to the end of the list.
Note:
The iterator stops when the shortest input iterable is exhausted.
Now, we are ready to create our X Y scatter chart.
Scatter chart
This section will work with “modified_my_project.xlsx” located in the download file of this tutorial.
Don’t forget to copy the file path of the file and make the necessary changes in the code snippet below.
The code snippet below creates our chart.
from openpyxl import load_workbook from openpyxl.chart import ( ScatterChart, Reference, Series, ) file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\modified_my_project.xlsx" wb = load_workbook(file_path) ws = wb["Chart Data"] chart = ScatterChart() chart.title = "Heat duty profile of crude flow" chart.style = 13 chart.x_axis.title = 'Temperature of Crude' chart.y_axis.title = 'Heat Duty' xvalues = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row) yvalues = Reference(ws,min_col=2,min_row=1,max_row=ws.max_row) series = Series(yvalues,xvalues,title_from_data=True) chart.series.append(series) ws.add_chart(chart,"D4") wb.save(file_path)
Below is the output of the code snippet above.
Refer to openpyxl official documentation to learn how to create other types of chart.
Task 1
Sending Emails to address stored in an excel file
Let say you are running a small business and you want to remind clients of the amount they owe you through email.
The email can just be like this:
Hello John Mark,
Please, we want to remind you that you owe us $200
-xyz company
You may like to be able to change the name and amount in each message you want to send.
The client’s details (name, amount owning, email address) are in an Excel workbook.
Doing this task manually, can be annoying,
But a simple Python scripts can do the job for you in seconds once the script has been written.
Warning:
- Email services have a daily limit for sending out emails. Don’t try this with 10000 emails so you don’t get blocked. I think 10-20 emails are good.
- I would advise you create a new email account if you have not tried to send emails using Python preferably with gmail.
- For gmail users, turn on your lesser secure apps setting so you can login to your email using Python
Don’t be scared of these warning, I am just protecting you if you are trying this out for the first time.
The excel file you will be working in this section is email_list.xlsx located in the folder you downloaded for this tutorial.
Please, I didn’t provide any real person email so change the emails to yours or friend’s email. Let them know you are using them email.
Below is the screenshot of the workbook you will be working with.
The below code snippet will do the task that was mention above.
import openpyxl import smtplib file_path = "C:\\Users\\RAPTURE C. GODSON\\Documents\\Email marketing\\deleteme.xlsx" wb = openpyxl.load_workbook(file_path) #Get sheet where client's details are in sheet = wb["customer"] names = [sheet.cell(row=i,column=1).value for i in range(2,6)] email = [sheet.cell(row=i,column=2).value for i in range(2,6)] amount = [sheet.cell(row=i,column=3).value for i in range(2,6)] #message to send to clients message = "Subject: Reminder of your credit\n\nHello {0},\n Please, we want to remind you that you owe us ${1}\n -XYZ company" host = "smtp.gmail.com" port = 587 username = "Enter your email" password = "Enter your email password" s = smtplib.SMTP(host,port) s.ehlo() s.starttls() s.login(username,password) i = 0 while i < len(email): for name in names: if name==None: pass else: s.sendmail(username,email[i],message.format(name,amount[i])) i+=1 s.quit()
Note:
When I created the list emails, name and account, I purposely exceed the rows that had data.
This code below prevents my program from crashing when a cell does not have a value.
if name==None: pass
If you take time to go through the code, you will be able to understand.
Here is a guide on how to send email using Python.
Below is a screenshot of one of the email sent.
Task 2:
Let’s say you have an email list of 1 million people from different countries.
The first column has the email addresses of the people.
The second column has the location of the people.
For a good reason, you are interested in getting the email addresses of people located in the United States.
Doing this may take sleepless nights.
The code snippets will easily do the task for you.
Sorry, I can’t provide an email list to demonstrate this, if you have one, you can work with it.
from openpyxl import load_workbook filepath = "C:\\Users\\RAPTURE C. GODSON\\Documents\\openpyxl\\email_list.xlsx" wb = load_workbook(filepath) ws = wb["Email"] max_row = ws.max_row for i in range(1,max_row+1): if ws.cell(row=i,column=2).value == "United States": print(ws.cell(row=i,column=1).value)
You have come to the end of this tutorial.
Tell your friends on Twitter that you now know how to work with Excel using PythonClick To TweetPlease don’t forget to:
Please if you have any questions or contributions, please leave it in the comment section.
Thanks for reading. If you enjoyed this tutorial, hit the share button. It would mean a lot to me and it helps other people to see the tutorial.
Learn how to convert HTML to XLS or PDF files using Python.
The post Working with Excel worksheets in Python using openpyxl. appeared first on Cool Python Codes.
from Cool Python Codes
via read more
No comments:
Post a Comment