How to Use Python With Excel Files
TechnologyReady to leverage the power of Python? If you have data in Excel, Python is just what you need to make the most of those files.
First though you need to get the data into Python. Luckily, reading Excel files is simple and easy. All you need to do is install a few modules.
Using pip install the following. If you are using environments make sure you are working in the right environment.
The modules are pandas, xlrd, and openpyxl. They should install pretty fast.
Preparing Python to Work With Excel Files
$pip install pandas
$pip install xlrd
$pip install openpyxl
With the modules installed via pip you are ready to get to work!
Import Excel Workbooks into Python with Openpyxl Module and Pandas
First thing we need to do is import the load_workbook function from openpyxl. With load_workbook we can now load the workbook and see the sheetnames. Using pandas we then load the data from any sheet we wish into a DataFrame.
import pandas as pd
from openpyxl import load_workbook
# Load the workbook
wb = load_workbook('Customers.xlsx')
# Select the worksheet to use
ws = wb.worksheets[1]
# Load the worksheet in Dataframe
df = pd.DataFrame(ws.values)
# Finished with data. Save to new file.
wb.save('Customers-Saved.xlsx')
Checking we can see the sheetnames list from the workbook and verify the data for the second sheet is indeed loaded in the DataFrame.
>>>
>>> wb.sheetnames
['Customers', 'Work']
>>> df
0 1
0 Profession Work Experience
1 Healthcare 1
2 Engineer 3
3 Engineer 1
4 Lawyer 0
... ... ...
1996 Artist 8
1997 Doctor 7
1998 Healthcare 9
1999 Executive 7
2000 Entertainment 5
[2001 rows x 2 columns]
>>>
Analyzing Excel Data with Python and Pandas
With the data loaded into a DataFrame you have the power of Python and Pandas at your fingertips. I am not going to describe how to do analysis or visualizations of your data in this article. That could easily take a book, or two.
Here is a short list possibilities once the data is loaded:
- Graphs and charts using Matplotlib. Many styles are available.
- Exporting charts to Excel.
- Matrix operations.
- Statistics.
- Data Cleaning and preparation. This can take most of your time in any data project. Real world data is messy! Python is an excellent tool for cleaning up data messes.
I am barely scratching the surface. You can do most things you might want to do with Python and pandas.
Saving Data from Python to Excel
Finally we get to the last line where we save our work back to an Excel file. Warning: These methods will overwrite existing files without warning. In this example only the active worksheet gets save to the new workbook.
# Finished with data. Save to new file.
wb.save('Customers-Saved.xlsx')
All done!
As you can see saving is pretty easy. Though we didn’t show examples there are quite a few options to explore. Saving multiple sheets, adding sheets to existing files, appending data, saving charts, and more.
You’ve seen what you need to do to use Python for reading and working with Excel files. Installing the modules and loading an Excel file for analysis were pretty straightforward. When you are done saving is pretty easy too.
note: Sample data is from Kaggle and was modified for these examples.
Share with Friends!