Thursday, November 21, 2013

Publish Microsoft Excel XLSX data to HTML with openpyxl

By Vasudev Ram

I had come across openpyxl, a library by Eric Gazoni, for reading and writing Microsoft Excel XLSX files (Open Office XML), a while ago.

So today I wrote a demo program that reads the data from an XLSX file using openpyxl and writes that data to HTML as a table. Here is a screenshot of the sample XLSX file used, fruits.xlsx (click image to enlarge):

Here is the program,

# Program to convert the data from an XLSX file to HTML.
# Uses the openpyxl library.

# Author: Vasudev Ram -

import openpyxl
from openpyxl import load_workbook

workbook = load_workbook('fruits.xlsx')
worksheet = workbook.get_active_sheet()

html_data = """
        XLSX to HTML demo
        XLSX to HTML demo

ws_range = worksheet.range('A1:H13')
for row in ws_range:
    html_data += "<tr>
    for cell in row:
        if cell.value is None:
            html_data += "<td> + ' ' + "<td>
            html_data += "<td> + str(cell.value) + "<td>
    html_data += "<tr>
html_data += "<table>lt;body>lt;html>

with open("fruits.html", "w") as html_fil:


You can run the program with:
Then the program's HTML output will be in the file fruits.html, a screenshot of which is below (click to enlarge):

- Enjoy.

- Vasudev Ram - Python, C, Linux, databases, open source - training and consulting.

Read all Python posts on my blog.

O'Reilly 50% Ebook Deal of the Day


Philippe ENTZMANN said...

You can use the pandas library :

import pandas
df = pandas.DataFrame.read_excel('myfile.xlsx')
table = df.to_html()

Vasudev Ram said...

Cool, thanks.

Vasudev Ram said...

Note to readers: Though I did replace the HTML elements' angle brackets with the corresponding HTML entities (ampersand "lt" semicolon, ampersand "gt" semicolon, etc.) - in the Python code in the post, it got messed up in a few places. Make the obvious changes after looking at the code, and things will work.

PeterD said...

Pandas is not that simple and the code is wrong as DataFrame does not have a read_excel function (at least in the current pandas version):

import pandas
xd = pandas.ExcelFile('fruits.xlsx')
df = xd.xd.parse(xd.sheet_names[0], header=None, keep_default_na=False)
with open("fruits_pandas.html", "w") as html_fil:
html_fil.write(df.to_html(header=False, index=False))

Blog comments don't keep the python indenting so if anyone wants to use the code, make sure to put the block indenting back into the line following "with...html_fil:" section near the end of the code snippet.

The only annoying thing with pandas is the to_html inserts a <border="1"... value which I could not find in the API definition how to remove it or change its value.

Vasudev Ram said...

Interesting ... thanks.