Working with Excel in Python

Read & write Excel files with pandas and openpyxl: sheets, engines, formatting, formulas, dates, and performance tips.

Quiz progress
0 / 10 answered
Contents

Fundamentals #

What you’ll use
  • pandas.read_excel() / DataFrame.to_excel() for fast, high-level workflows.
  • openpyxl for low-level control of .xlsx/.xlsm (cells, styles, freeze panes, formulas).
  • “Engine” matters: openpyxl (xlsx/xlsm), xlsxwriter (write xlsx with rich formatting), pyxlsb (read xlsb), xlrd 1.2.0 (legacy .xls read).
# Quick tour with pandas
import pandas as pd

df = pd.DataFrame({"id":[1,2], "name":["Ada","Grace"]})
df.to_excel("people.xlsx", index=False, sheet_name="People")   # write

df2 = pd.read_excel("people.xlsx", sheet_name="People")        # read
print(df2.head())
“Use pandas for tables; drop down to openpyxl/xlsxwriter when you need Excel-specific features.”
— Pythonic advice

Quiz: Fundamentals

1. Which function reads an Excel file into a DataFrame?
2. Which library lets you style cells and write formulas in .xlsx?

Reading Excel #

Selective loading with pandas
  • Choose a sheet: sheet_name="Sheet1" or index 0. Use None for all sheets (returns dict).
  • Limit columns/rows: usecols="A:C" or list; skiprows=, nrows=.
  • Types & missing: dtype={"id":int}, na_values=["NA",""], parse_dates=[... ].
import pandas as pd

orders = pd.read_excel(
    "orders.xlsx",
    sheet_name="2025-Q1",
    usecols="A:D",
    dtype={"qty":"Int64"},
    na_values=["", "NA"],
    parse_dates=["order_date"]
)
print(orders.dtypes)

Quiz: Reading

3. Which argument selects only “A:C” columns while reading?
4. How do you read all sheets with pandas?

Writing Excel #

Single & multi-sheet output
  • DataFrame.to_excel("out.xlsx", index=False) to avoid the index column.
  • Multiple sheets: pd.ExcelWriter(...) context, call to_excel(..., sheet_name=...) for each.
  • Append to existing file: ExcelWriter(..., mode="a", if_sheet_exists="replace") (depends on engine).
import pandas as pd

with pd.ExcelWriter("report.xlsx", engine="xlsxwriter") as xw:
    df_sales.to_excel(xw, index=False, sheet_name="Sales")
    df_costs.to_excel(xw, index=False, sheet_name="Costs")

Quiz: Writing

5. How do you prevent the index from being written as a column?
6. What’s the idiomatic way to write multiple sheets in one file?

Formatting, Freeze, Formulas #

Two approaches
  • pandas + xlsxwriter: easy formatting via workbook/worksheet objects.
  • openpyxl: manipulate cells directly (styles, freeze panes, formulas, keep VBA).
# pandas + xlsxwriter: freeze panes & set column width
with pd.ExcelWriter("styled.xlsx", engine="xlsxwriter") as xw:
    df.to_excel(xw, index=False, sheet_name="Data")
    ws = xw.sheets["Data"]
    ws.freeze_panes(1, 0)       # freeze first row
    ws.set_column("A:A", 12)    # width
    ws.write_formula("D2", "=B2*C2")  # formula

# openpyxl: styles / formulas / freeze
from openpyxl import Workbook
from openpyxl.styles import Font
wb = Workbook()
ws = wb.active
ws.title = "Data"
ws.append(["name","qty","price","total"])
ws.append(["Widget", 3, 9.99, "=B2*C2"])
ws["A1"].font = Font(bold=True)
ws.freeze_panes = "A2"
wb.save("styled_openpyxl.xlsx")

Quiz: Formatting & Formulas

7. How do you freeze panes at B2 using openpyxl?
8. With the xlsxwriter engine, how do you add a formula to D2?

Engines & Compatibility #

Which engine for which file?
  • .xlsx/.xlsm: read/write with openpyxl; rich formatting on write with xlsxwriter.
  • .xls (Excel 97-2003): read with xlrd==1.2.0 (legacy) or convert via Excel/LibreOffice first.
  • .xlsb: read via pyxlsb engine in pandas; writing is not generally supported.
  • Macros: preserve by loading with openpyxl.load_workbook(..., keep_vba=True) and saving as .xlsm.
# Preserve macros with openpyxl
from openpyxl import load_workbook
wb = load_workbook("macro_book.xlsm", keep_vba=True)
# ... modify values/styles ...
wb.save("macro_book_out.xlsm")

Quiz: Engines

9. To read .xlsb with pandas, which engine should you use?
10. How do you preserve VBA macros when editing an .xlsm file?

Dates & Pitfalls #

Common gotchas
  • Excel stores dates as serial numbers; beware the 1900 “leap-year” bug. Prefer pandas parse_dates and set date_format on write.
  • To read displayed values of formulas with openpyxl, load with data_only=True (requires the file to have been calculated by Excel).
  • Large sheets: prefer openpyxl.load_workbook(..., read_only=True) and iterate rows; pandas read_excel loads into memory (no chunksize).
# Read displayed values from formulas (not the formulas)
from openpyxl import load_workbook
wb = load_workbook("calc.xlsx", data_only=True)
ws = wb.active
print(ws["D2"].value)  # last-calculated result

Final Quiz & Summary #

Review your performance and revisit questions you missed.

Category: python · Lesson: excel-basics
Learning by Examples
Edit the code and see the result in the console panel.
# Excel playground
# This tries to use openpyxl if available; otherwise prints a helpful message.

from io import BytesIO
print("— Excel I/O demo —")

try:
    from openpyxl import Workbook, load_workbook
    from openpyxl.styles import Font

    # Create an in-memory workbook
    wb = Workbook()
    ws = wb.active
    ws.title = "Data"
    ws.append(["name","qty","price","total"])
    ws.append(["Widget", 3, 9.99, "=B2*C2"])
    ws["A1"].font = Font(bold=True)
    ws.freeze_panes = "A2"

    buf = BytesIO()
    wb.save(buf)
    print("Created workbook in memory with 2 rows + header.")
    print("Cell D2 is a formula:", ws["D2"].value)

    # Read back displayed values (not formulas) if pre-calculated
    buf.seek(0)
    wb2 = load_workbook(buf, data_only=True)
    ws2 = wb2["Data"]
    print("Reading back (data_only=True) D2 value:", ws2["D2"].value)

except Exception as e:
    print("openpyxl not available in this environment.")
    print("To run locally:  pip install openpyxl")
    print("Then create a workbook, write rows, set styles, and save to .xlsx.")
    print("Error detail:", type(e).__name__, "-", str(e))