HexTxt

How to Use Python With Excel Files

Technology
analyzing business data with python and Excel

Open Excel Files with Python

Ready to leverage the power of Python? If you have data in Excel, Python is just what you need to make the most of those files.

First though you need to get the data into Python. Luckily, reading Excel files is simple and easy. All you need to do is install a few modules.

Using pip install the following. If you are using environments make sure you are working in the right environment.

The modules are pandas, xlrd, and openpyxl. They should install pretty fast.

Preparing Python to Work With Excel Files

$pip install pandas
$pip install xlrd
$pip install openpyxl

With the modules installed via pip you are ready to get to work!

Import Excel Workbooks into Python with Openpyxl Module and Pandas

First thing we need to do is import the load_workbook function from openpyxl. With load_workbook we can now load the workbook and see the sheetnames. Using pandas we then load the data from any sheet we wish into a DataFrame.

import pandas as pd
from openpyxl import load_workbook

# Load the workbook
wb = load_workbook('Customers.xlsx')

# Select the worksheet to use
ws = wb.worksheets[1]

# Load the worksheet in Dataframe
df = pd.DataFrame(ws.values)

# Finished with data. Save to new file.
wb.save('Customers-Saved.xlsx')

Checking we can see the sheetnames list from the workbook and verify the data for the second sheet is indeed loaded in the DataFrame.

>>>
>>> wb.sheetnames
['Customers', 'Work']
>>> df
                  0                1
0        Profession  Work Experience
1        Healthcare                1
2          Engineer                3
3          Engineer                1
4            Lawyer                0
...             ...              ...
1996         Artist                8
1997         Doctor                7
1998     Healthcare                9
1999      Executive                7
2000  Entertainment                5

[2001 rows x 2 columns]
>>>

Analyzing Excel Data with Python and Pandas

With the data loaded into a DataFrame you have the power of Python and Pandas at your fingertips. I am not going to describe how to do analysis or visualizations of your data in this article. That could easily take a book, or two.

Here is a short list possibilities once the data is loaded:

I am barely scratching the surface. You can do most things you might want to do with Python and pandas.

Saving Data from Python to Excel

Finally we get to the last line where we save our work back to an Excel file. Warning: These methods will overwrite existing files without warning. In this example only the active worksheet gets save to the new workbook.

# Finished with data. Save to new file.
wb.save('Customers-Saved.xlsx')

All done!

As you can see saving is pretty easy. Though we didn’t show examples there are quite a few options to explore. Saving multiple sheets, adding sheets to existing files, appending data, saving charts, and more.

You’ve seen what you need to do to use Python for reading and working with Excel files. Installing the modules and loading an Excel file for analysis were pretty straightforward. When you are done saving is pretty easy too.

note: Sample data is from Kaggle and was modified for these examples.

Share with Friends!
LinkedIn
Reddit
Hacker News