Thursday, November 21, 2013

Publish Microsoft Excel XLSX data to HTML with openpyxl


By Vasudev Ram

I had come across openpyxl, a library by Eric Gazoni, for reading and writing Microsoft Excel XLSX files (Open Office XML), a while ago.

So today I wrote a demo program that reads the data from an XLSX file using openpyxl and writes that data to HTML as a table. Here is a screenshot of the sample XLSX file used, fruits.xlsx (click image to enlarge):


Here is the program, XLSXtoHTMLdemo.py:
# XLSXtoHTMLdemo.py

# Program to convert the data from an XLSX file to HTML.
# Uses the openpyxl library.

# Author: Vasudev Ram - http://www.dancingbison.com

import openpyxl
from openpyxl import load_workbook

workbook = load_workbook('fruits.xlsx')
worksheet = workbook.get_active_sheet()

html_data = """
<html>
    <head>
        <title>
        XLSX to HTML demo
        <title>
    <head>
    <body>
        <h3>
        XLSX to HTML demo
        <h3>
    <table>
"""

ws_range = worksheet.range('A1:H13')
for row in ws_range:
    html_data += "<tr>
    for cell in row:
        if cell.value is None:
            html_data += "<td> + ' ' + "<td>
        else:
            html_data += "<td> + str(cell.value) + "<td>
    html_data += "<tr>
html_data += "<table>lt;body>lt;html>

with open("fruits.html", "w") as html_fil:
    html_fil.write(html_data)

# EOF

You can run the program with:
python XLSXtoHTMLdemo.py
Then the program's HTML output will be in the file fruits.html, a screenshot of which is below (click to enlarge):


- Enjoy.

- Vasudev Ram - Python, C, Linux, databases, open source - training and consulting.

Read all Python posts on my blog.




O'Reilly 50% Ebook Deal of the Day

5 comments:

Philippe ENTZMANN said...

You can use the pandas library :

import pandas
df = pandas.DataFrame.read_excel('myfile.xlsx')
table = df.to_html()

Vasudev Ram said...

Cool, thanks.

Vasudev Ram said...

Note to readers: Though I did replace the HTML elements' angle brackets with the corresponding HTML entities (ampersand "lt" semicolon, ampersand "gt" semicolon, etc.) - in the Python code in the post, it got messed up in a few places. Make the obvious changes after looking at the code, and things will work.

PeterD said...

Pandas is not that simple and the code is wrong as DataFrame does not have a read_excel function (at least in the current pandas version):

import pandas
xd = pandas.ExcelFile('fruits.xlsx')
df = xd.xd.parse(xd.sheet_names[0], header=None, keep_default_na=False)
with open("fruits_pandas.html", "w") as html_fil:
html_fil.write(df.to_html(header=False, index=False))

Blog comments don't keep the python indenting so if anyone wants to use the code, make sure to put the block indenting back into the line following "with...html_fil:" section near the end of the code snippet.

The only annoying thing with pandas is the to_html inserts a <border="1"... value which I could not find in the API definition how to remove it or change its value.

Vasudev Ram said...

Interesting ... thanks.