Thursday, November 21, 2013

Publish Microsoft Excel XLSX data to HTML with openpyxl


By Vasudev Ram

I had come across openpyxl, a library by Eric Gazoni, for reading and writing Microsoft Excel XLSX files (Open Office XML), a while ago.

So today I wrote a demo program that reads the data from an XLSX file using openpyxl and writes that data to HTML as a table. Here is a screenshot of the sample XLSX file used, fruits.xlsx (click image to enlarge):


Here is the program, XLSXtoHTMLdemo.py:
# XLSXtoHTMLdemo.py

# Program to convert the data from an XLSX file to HTML.
# Uses the openpyxl library.

# Author: Vasudev Ram - http://www.dancingbison.com

import openpyxl
from openpyxl import load_workbook

workbook = load_workbook('fruits.xlsx')
worksheet = workbook.get_active_sheet()

html_data = """
<html>
    <head>
        <title>
        XLSX to HTML demo
        <title>
    <head>
    <body>
        <h3>
        XLSX to HTML demo
        <h3>
    <table>
"""

ws_range = worksheet.range('A1:H13')
for row in ws_range:
    html_data += "<tr>
    for cell in row:
        if cell.value is None:
            html_data += "<td> + ' ' + "<td>
        else:
            html_data += "<td> + str(cell.value) + "<td>
    html_data += "<tr>
html_data += "<table>lt;body>lt;html>

with open("fruits.html", "w") as html_fil:
    html_fil.write(html_data)

# EOF

You can run the program with:
python XLSXtoHTMLdemo.py
Then the program's HTML output will be in the file fruits.html, a screenshot of which is below (click to enlarge):


- Enjoy.

- Vasudev Ram - Python, C, Linux, databases, open source - training and consulting.

Read all Python posts on my blog.




O'Reilly 50% Ebook Deal of the Day

7 comments:

Philippe ENTZMANN said...

You can use the pandas library :

import pandas
df = pandas.DataFrame.read_excel('myfile.xlsx')
table = df.to_html()

Vasudev Ram said...

Cool, thanks.

Vasudev Ram said...

Note to readers: Though I did replace the HTML elements' angle brackets with the corresponding HTML entities (ampersand "lt" semicolon, ampersand "gt" semicolon, etc.) - in the Python code in the post, it got messed up in a few places. Make the obvious changes after looking at the code, and things will work.

PeterD said...

Pandas is not that simple and the code is wrong as DataFrame does not have a read_excel function (at least in the current pandas version):

import pandas
xd = pandas.ExcelFile('fruits.xlsx')
df = xd.xd.parse(xd.sheet_names[0], header=None, keep_default_na=False)
with open("fruits_pandas.html", "w") as html_fil:
html_fil.write(df.to_html(header=False, index=False))

Blog comments don't keep the python indenting so if anyone wants to use the code, make sure to put the block indenting back into the line following "with...html_fil:" section near the end of the code snippet.

The only annoying thing with pandas is the to_html inserts a <border="1"... value which I could not find in the API definition how to remove it or change its value.

Vasudev Ram said...

Interesting ... thanks.

Dany R said...

Hi Ram, New to python and tried the above code. but not able to locate correct places for double quotes..tried with as much as i know but no luck. can you pls correct and post the right code. Actually i just want to convert spreadsheet data into html. no headers/footers.

Thanks in advance!!

Vasudev Ram said...


Hi Dany,

The code worked fine for me.

If you are new to Python, you are likely to have other problems too - with any non-trivial code, not just with this one. So I suggest you 1) spend some time learning Python more, which will be beneficial in the long run too, not just for your current need, and 2) for now, for your need to convert spreadsheet to HTML, search for some utility that can do it for you without any programming knowledge. You can try Zamzar and other such sites.

HTH.