import pandas as pd
import numpy as np
import xlwings as xw
import os
In this post, I’m going to overview how to use Python’s xlwings
and pywin32
libraries to produce formatted, human-readable tables in Excel. Beyond basic uses of xlwings
, I’ll overview how to leverage pywin32
and the .api
attribute to interact directly with the Excel object model using VBA-like Python code. The approach allows one to produce highly customizable Excel tables that should satisfy any audience that prefers formatted spreadsheet output rather than, say, a quarto doc.
Alongside this blog post, I’ve been developing an excel
submodule in my swiss-code
package, with various wrapper functions for xlwings
and pywin32
. I’ll also use this post to showcase that submodule, and specifically the ExcelDataFrame
class that I’ve developed to integrate a given pandas
dataframe with a corresponding table range in Excel.
Problem Setup
Code for dataset generation
def simulate_df(num_transactions=1000):
1)
np.random.seed(= np.random.randint(1000, 5000, num_transactions)
customer_ids = np.round(np.random.uniform(5, 500, num_transactions), 2)
transaction_amounts = np.random.choice(["Credit Card", "Debit Card", "PayPal", "Cash"], num_transactions)
payment_methods = np.random.choice(["Electronics", "Clothing", "Groceries", "Entertainment"], num_transactions)
categories = pd.date_range(start="2024-01-01", periods=num_transactions, freq="D")
transaction_dates = {
data "transaction_id": range(1, num_transactions + 1),
"customer_id": customer_ids,
"amount": transaction_amounts,
"payment_method": payment_methods,
"category": categories,
"date": transaction_dates
}= pd.DataFrame(data)
df 'year'] = df['date'].dt.year.astype(str)
df["class"] = df["category"].map(
df[
{"Clothing": "Necessity",
"Groceries": "Necessity",
"Electronics": "Discretionary",
"Entertainment": "Discretionary",
}
)return df
Say we have a dataset of customer transactions at a super store. I simulated this data arbitrarily.
= simulate_df()
df df.head()
transaction_id | customer_id | amount | payment_method | category | date | year | class | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2061 | 177.75 | Debit Card | Electronics | 2024-01-01 | 2024 | Discretionary |
1 | 2 | 1235 | 33.20 | Credit Card | Clothing | 2024-01-02 | 2024 | Necessity |
2 | 3 | 4980 | 118.26 | Credit Card | Groceries | 2024-01-03 | 2024 | Necessity |
3 | 4 | 2096 | 333.73 | Debit Card | Entertainment | 2024-01-04 | 2024 | Discretionary |
4 | 5 | 4839 | 251.14 | Credit Card | Electronics | 2024-01-05 | 2024 | Discretionary |
Let’s imagine the case where a stakeholder asks for a report on the following:
- Total dollars transacted, broken down by key categories and payment method.
It’s fairly straightforward to use pandas
to make a pivot table report, complete with human-readable, title-cased indices and columns. In this case, I’ve aggregated the total dollars spent, dissagregated by payment method and category of good. Some users prefer that the pivot table also include subtotals, and we flexibly add those with margins=True
.
= df.pivot_table(
dollars_by_method ="payment_method",
index="category",
columns="amount",
values="sum",
aggfunc=True,
margins="Total",
margins_name
)= "Payment Method"
dollars_by_method.index.name = "Category"
dollars_by_method.columns.name dollars_by_method
Category | Clothing | Electronics | Entertainment | Groceries | Total |
---|---|---|---|---|---|
Payment Method | |||||
Cash | 15016.00 | 17026.47 | 18208.16 | 18242.88 | 68493.51 |
Credit Card | 16074.92 | 10960.64 | 16052.85 | 20187.25 | 63275.66 |
Debit Card | 14032.23 | 17325.96 | 14363.99 | 17188.67 | 62910.85 |
PayPal | 14143.84 | 15108.84 | 16852.17 | 12065.14 | 58169.99 |
Total | 59266.99 | 60421.91 | 65477.17 | 67683.94 | 252850.01 |
That’s a fine-enough pandas
dataframe, but now the question arises of how to best serve this data to the end-user.
There are several options – lately I’ve been using Quarto reports or else Tableau dashboards for these sorts of analytics projects. However, those formats are very “finalized” and can be a little unsatisfying for end-users who want to further filter or modify table output. If that audience doesn’t use R/Python, access to underlying .ipynb
or .qmd
files will not be helpful. This is the case where I think that spreadsheet output, and specifically reproducible, formatted Excel worksheets, can be very potent.
Getting connected to Excel
To write a pandas
dataframe out to Excel, we’ll work through the Excel/xlwings
object hierarchy:
- Application1 \(\rightarrow\) Work Book \(\rightarrow\) Sheet \(\rightarrow\) Range.
In xlwings
we’ll establish a xw.Book
object, which either creates or opens an existing excel work book, in this case test.xlsx
.
= "test.xlsx"
filename if os.path.exists(filename):
= xw.Book(filename)
wb else:
= xw.Book()
wb
wb.save(filename) wb
<Book [test.xlsx]>
Next, we establish an xw.Sheet
within that xw.Book
object, in this case I’ll call the sheet first_test
.
= 'first_test'
sheet_name try:
= wb.sheets.add(sheet_name)
sheet_new except ValueError:
= wb.sheets[sheet_name]
sheet_new sheet_new
<Sheet [test.xlsx]first_test>
At this point, we can select the sheet, sheet_new
, select a start range, “A1”, then set it’s value with our dataframe object from the previous section, dollars_by_method
. This will export the full dataframe into the spreadsheet.
range("A1").options(index=True, header=True).value = dollars_by_method sheet_new.
Then, when we save the book, we can inspect the output by opening text.xlsx
. Note that the data were entered into cell A1
and expanded as needed.
wb.save()
This is an unattractive table, with no formatting and with poorly fitted columns. This is also equivalent to what we could get if we simply used the df.to_excel()
method within pandas
. We’ll proceed to format this table, that that is where we get the real value out of xlwings
.
Interacting with the Excel Object Model
The most powerful aspect of xlwings
is that it allows for interacting with the Excel object model – an API for programmatically editing Excel spreadsheets. One would typically interact with Excel’s object model via VBA code, but xlwings
and pywin32
together allow for us to use Python for a conceptually similar workflow.
Note that Microsoft provides a few key pieces of documentation that augment xlwings
’ own API reference.
- An overview of the object model concept and typical VBA workflows in in the “VBA Programming 101” section of the Official VBA Reference, and,
- A complete reference for the Excel object model in the Official VBA Reference.
We can access the object model API from an xw.Sheet
object by calling the .api
attribute:
sheet_new.api
<win32com.gen_py.Microsoft Excel 16.0 Object Library._Worksheet instance at 0x2714414253696>
This is an instance of VBA’s Worksheet object, which has a documentation page in the VBA reference.
AutoFit
Python objects have attributes and methods, whereas the VBA Worksheet object has properties and methods. One particularly useful property is UsedRange
, which stores the range of cells that have data. We often access that property when we want to format a table.
sheet_new.api.UsedRange
<win32com.gen_py.Microsoft Excel 16.0 Object Library.Range instance at 0x2712378707792>
For very popular properties and methods from the Excel object model, such as UsedRange
, the xlwings
api has simplified python attributes and methods that allow one to avoid explicity accessing the object model api. E.g. we can call xw.Sheet.used_range
for similar functionality to xw.Sheet.UsedRange
.
sheet_new.used_range
<Range [test.xlsx]first_test!$A$1:$F$6>
I’ll stick with the former approach for now.
From the UsedRange
attribute, which is a Range
object, we can access the EntireColumn property, which selects the entire columns for a given range.
sheet_new.api.UsedRange.EntireColumn
<win32com.gen_py.Microsoft Excel 16.0 Object Library.Range instance at 0x2712379336816>
This is equivalent to the following selection:
For that Range
object, we will now call the Autofit() method.
sheet_new.api.UsedRange.EntireColumn.AutoFit()
True
We get the following in Excel:
Let’s autofit the rows and the columns where there is currently data.
sheet_new.api.UsedRange.EntireColumn.AutoFit()
sheet_new.api.UsedRange.EntireRow.AutoFit() wb.save()
Number Formatting
Now we’ll move onto formatting the data, which represent dollar totals, as follows:
11020
\(\rightarrow\)$11,020
I’ll be using a mix of xlwings
wrapper methods/attributes, and explicit calls to the Excel object model via .api
, with no real commitment as to when to use which. Given that we get equivalent results, I’m of the mind that it doesn’t particularly matter.
We can apply dollar formatting to any range in Excel via the NumberFormat property – number_format
in xlwings
. But to do that, we’ll first need to define a range that captures exactly where there is numerical data.
I’ll first select the top row of the sheet and retrieve its values, which are the column headers:
= sheet_new.range("A1").expand("right").value
headers headers
['Payment Method',
'Clothing',
'Electronics',
'Entertainment',
'Groceries',
'Total']
We now want to access those columns that actually contain data (Payment Method is the index title and doesn’t contain data). We’ll do this by taking the index of Clothing, the first column with data, adding one,2
= headers.index('Clothing') + 1
col_index col_index
2
then retrieving the letter that corresponds to that index:
= xw.utils.col_name(col_index)
col_letter col_letter
'B'
Given the column letter, we can now select all of the data in that column:
= f"{col_letter}2:{col_letter}1000000"
selection selection
'B2:B1000000'
Then we can set the number format (see supported Number format codes):
range(selection).number_format = "$#,###.##"
sheet_new. wb.save()
To format all of the data, we’ll just iterate through the columns and execute that same chunk of code:
for col in dollars_by_method.columns:
= headers.index(col) + 1
col_index = xw.utils.col_name(col_index)
col_letter = f"{col_letter}2:{col_letter}1048576"
selection range(selection).number_format = "$#,###.##"
sheet_new. wb.save()
Text Formatting
Now I’d like to bold-format the index to differentiate it from the data.
We’ll use a similar process to make the column and index headers bold. I’ll translate the dataframe’s headers into an Excel range as follows – I retrieve the numerical index of the last column (indexed from 1)
= dollars_by_method.reset_index().shape[1]
last_column_index last_column_index
6
I translate that into its column letter:
= xw.utils.col_name(last_column_index)
end_col end_col
'F'
Then I use that to define the range of the columns, which are all in row 1:
= f"A1:{end_col}1"
selection selection
'A1:F1'
With that range selection, we can access the .font
attribute in xlwings
, which corresponds to the Font object in the Excel object model, and set bold = True
.
= sheet_new.range(selection)
header_range = True
header_range.font.bold wb.save()
I’ll do the same with the index, capturing its range then setting bold = True
= dollars_by_method.reset_index().shape[0] + 1
end_row = f"A1:A{end_row}"
selection selection
'A1:A6'
= sheet_new.range(selection)
header_range = True
header_range.font.bold wb.save()
Shading and Borders
To finish the formatting, I’d also like to shade in the grand total column/row so as to differentiate it from unaggregated data, and add basic black borders on all cells.
We’ll be doing much the same as before when it comes to shading in rows/columns – we define a range that captures an entire target row/column, then access the appropriate object to format that range. I’ll reuse the end_col
and end_row
values defined above to get the range of the last column
= f"{end_col}1:{end_col}{end_row}"
total_col total_col
'F1:F6'
In excel, we set colors using RGB codes, which I like to get from https://htmlcolorcodes.com/, though any color-swatch site should suffice.
= f"{end_col}1:{end_col}{end_row}"
total_col range(total_col).color = (242, 242, 242)
sheet_new.= f"A{end_row}:{end_col}{end_row}"
total_row range(total_row).color = (242, 242, 242)
sheet_new.print("Total Column:", total_col, "\nTotal Row:", total_row)
wb.save()
Total Column: F1:F6
Total Row: A6:F6
For defining borders, we’ll just access the Borders object from the UsedRange, so that we apply the borders only where there is currently data. We set two properties for the Border:
- LineStyle, e.g. solid versus dashed.
- Weight, e.g. hairline versus thick.
- There are several other properties we can change as needed in the Border object’s properties, but I typically just focus on style and weight.3
= 1
sheet_new.used_range.api.Borders.LineStyle = 2
sheet_new.used_range.api.Borders.Weight wb.save()
A Package approach
xlwings
is great, but sometimes bridging the gap between working on a pandas
dataframe and an Excel object can be a little involved. For example, formatting a column requires finding that column within Excel then programatically defining its range, being mindful of the different numerical indexing systems in Python vs. Excel, potential issues with pd.MultiIndex
objects, etc.
This can require a lot of code, so I defined a set of wrapper functions, and eventually an entire class, ExcelDataFrame
, to facilitate moving between pandas
and Excel. This class is part of the submodule, swiss_code.excel
, which is a part of my larger swiss_code
package.
You can download as follows:
pip install git+https://github.com/peter-amerkhanian/swiss-code.git@main
My environment specific setup
import sys
'../../../swiss-code/src')
sys.path.append(%load_ext autoreload
%autoreload 2
Once you have the package installed, you can import the excel
submodule.
from swiss_code.excel import excel
Given a pandas
dataframe and an Excel sheet, we can establish an ExcelDataFrame
object via the excel.write_df_to_excel
function. This will write the pandas
dataframe into the Excel sheet, and establish an object that can be further manipulated.
= excel.get_or_create_workbook("test.xlsx")
wb = excel.select_sheet('function_sheet', wb)
function_sheet = excel.write_df_to_excel(df=dollars_by_method,
edf =function_sheet,
sheet="A1")
cell_start edf
ACTIVE ExcelDataFrame(sheet=function_sheet, range=$A$1:$F$6, df_shape=(5, 5))
We’ll now complete all the same formatting tasks as before via method calls:
for col in edf.df.columns:
format="$#,###.##")
edf.format_column_data(col,
edf.make_borders()=True, color=None)
edf.format_indices(bold'Total', bold="ignore", color="light_grey")
edf.format_row('Total', bold="ignore", color="light_grey")
edf.format_column( excel.close_out_book(wb)
Once excel.close_out_book(wb)
has been called, the edf
no longer has an active connection to any workbook.
edf
DISCONNECTED ExcelDataFrame(sheet=NA, range=NA, df_shape=(5, 5))
Complex Tables
I’ll show some further functionality of the ExcelDataFrame
class in the case of a much more complicated pivot table. Let’s say that a stakeholder wants something like the previous report, but with more disaggregation:
- Total dollars transacted, broken down by discretionary vs. essential, and broken down by key categories and payment method, for each year available. Include subtotals for each year.
This is a fairly complex pivot table, and requires a little bit for code beyond just pd.pivot_table
. I’ll iterate over years and generate pivot tables with subtotals for each, then I’ll concatenate them together:
# Year Pivots w/ subtotals
= pd.concat(
dollars_by_method_yr 'year'] == year]
[df[df[=["payment_method"],
.pivot_table(index=["class", "category"],
columns="amount",
values="sum",
aggfunc=True,
margins="Total")
margins_name
.reset_index()=year)
.assign(Year'Year', 'payment_method'])
.set_index([for year in ['2024', '2025', '2026']],
=0)
axis= dollars_by_method_yr.fillna(0)
dollars_by_method_yr dollars_by_method_yr
class | Discretionary | Necessity | Total | |||
---|---|---|---|---|---|---|
category | Electronics | Entertainment | Clothing | Groceries | ||
Year | payment_method | |||||
2024 | Cash | 5182.41 | 8661.28 | 5709.71 | 6552.53 | 26105.93 |
Credit Card | 2976.99 | 5334.29 | 5846.53 | 7983.28 | 22141.09 | |
Debit Card | 6997.97 | 4998.41 | 4887.89 | 5603.41 | 22487.68 | |
PayPal | 5649.27 | 7034.34 | 5225.22 | 3945.90 | 21854.73 | |
Total | 20806.64 | 26028.32 | 21669.35 | 24085.12 | 92589.43 | |
2025 | Cash | 6148.02 | 5110.80 | 4651.67 | 7717.81 | 23628.30 |
Credit Card | 4618.44 | 5946.31 | 6843.13 | 5828.80 | 23236.68 | |
Debit Card | 6606.99 | 7087.51 | 5801.20 | 4554.53 | 24050.23 | |
PayPal | 5220.53 | 5592.94 | 6203.22 | 3442.25 | 20458.94 | |
Total | 22593.98 | 23737.56 | 23499.22 | 21543.39 | 91374.15 | |
2026 | Cash | 5696.04 | 4436.08 | 4654.62 | 3972.54 | 18759.28 |
Credit Card | 3365.21 | 4772.25 | 3385.26 | 6375.17 | 17897.89 | |
Debit Card | 3721.00 | 2278.07 | 3343.14 | 7030.73 | 16372.94 | |
PayPal | 4239.04 | 4224.89 | 2715.40 | 4676.99 | 15856.32 | |
Total | 17021.29 | 15711.29 | 14098.42 | 22055.43 | 68886.43 |
Then I’ll attach a grand total at the bottom:
# Grand Total
"Grand Total", ""), :] = (
dollars_by_method_yr.loc[(
df.pivot_table(=["class", "category"],
index="amount",
values="sum",
aggfunc=True,
margins="Total"
margins_name"amount"]
)[
)# Table Formatting
= ["Year", "Payment Method"]
dollars_by_method_yr.index.names = ['Class', 'Category']
dollars_by_method_yr.columns.names dollars_by_method_yr
Class | Discretionary | Necessity | Total | |||
---|---|---|---|---|---|---|
Category | Electronics | Entertainment | Clothing | Groceries | ||
Year | Payment Method | |||||
2024 | Cash | 5182.41 | 8661.28 | 5709.71 | 6552.53 | 26105.93 |
Credit Card | 2976.99 | 5334.29 | 5846.53 | 7983.28 | 22141.09 | |
Debit Card | 6997.97 | 4998.41 | 4887.89 | 5603.41 | 22487.68 | |
PayPal | 5649.27 | 7034.34 | 5225.22 | 3945.90 | 21854.73 | |
Total | 20806.64 | 26028.32 | 21669.35 | 24085.12 | 92589.43 | |
2025 | Cash | 6148.02 | 5110.80 | 4651.67 | 7717.81 | 23628.30 |
Credit Card | 4618.44 | 5946.31 | 6843.13 | 5828.80 | 23236.68 | |
Debit Card | 6606.99 | 7087.51 | 5801.20 | 4554.53 | 24050.23 | |
PayPal | 5220.53 | 5592.94 | 6203.22 | 3442.25 | 20458.94 | |
Total | 22593.98 | 23737.56 | 23499.22 | 21543.39 | 91374.15 | |
2026 | Cash | 5696.04 | 4436.08 | 4654.62 | 3972.54 | 18759.28 |
Credit Card | 3365.21 | 4772.25 | 3385.26 | 6375.17 | 17897.89 | |
Debit Card | 3721.00 | 2278.07 | 3343.14 | 7030.73 | 16372.94 | |
PayPal | 4239.04 | 4224.89 | 2715.40 | 4676.99 | 15856.32 | |
Total | 17021.29 | 15711.29 | 14098.42 | 22055.43 | 68886.43 | |
Grand Total | 60421.91 | 65477.17 | 59266.99 | 67683.94 | 252850.01 |
Getting that complex table into a formatted Excel table presents a couple complications:
- There is a multi-index, which
pandas
presents as merged. In Excel, these merged multi-indexes will just render with the value repeated over the merged range. - We will want to shade and bold all subtotals to match the previous table’s formatting. This will be a little more difficult given that they exist for each year.
= excel.get_or_create_workbook("test.xlsx")
wb = excel.select_sheet('complex_sheet', wb)
complex_sheet = excel.write_df_to_excel(dollars_by_method_yr,
edf
complex_sheet,="A1") cell_start
I’ve defined a suit of methods to deal with this – notably edf.merge_axis
for dealing with any cell merging problems, and I can use simple for loops to format the “total” rows. I also added in some vertical and horizontal alignment options when formatting.
=1, axis=1)
edf.merge_axis(index=1, axis=0)
edf.merge_axis(index=True, color="light_grey", v_align='Center', h_align='Center')
edf.format_indices(boldfor col in edf.df.columns:
format="$#,###.##")
edf.format_column_data(col, for year in df['year'].unique():
'Total'), bold=True, color="light_grey")
edf.format_row((year, 'Grand Total', ''), bold=True, color="light_grey")
edf.format_row(('Total', ''), bold=True, color="light_grey")
edf.format_column((
edf.make_borders() excel.close_out_book(wb)
This package is a work in progress, so feel free to let me know if you have any ideas of how the design could be improved!
Footnotes
Dealing with the Application object in
xlwings
is optional, and the workflow outlined in this blog just ignores it.↩︎Excel indexes from 1 whereas python indexes from 0, so adding 1 will be common in a lot of these workflows↩︎
I found out about these pages through this excellent stackoverflow answer↩︎
Citation
@online{amerkhanian2025,
author = {Amerkhanian, Peter},
title = {Using {The} {Excel} {Object} {Model} in {Python}},
date = {2025-04-20},
url = {https://peter-amerkhanian.com/posts/excel-wings/},
langid = {en}
}