Monday, November 23, 2015

Convert XLSX to PDF with Python and xtopdf

By Vasudev Ram


XLSX => PDF

This is a simple application of my xtopdf toolkit, showing how to use it to convert XLSX data, i.e. Microsoft Excel data, to PDF (Portable Document Format). It only converts text data, not the formatting, colors, fonts, etc., that may be present in the Excel file.

For the input, I will use this small Excel file, fruits2.xlsx, which I created. A screenshot of it is below (click to enlarge):


Here is the code for XLSXtoPDF.py:
# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()
And here is a screenshot of the PDF output in fruits2.pdf:

There are some points worth mentioning in connection with conversion of data to and from PDF. I will discuss them in a follow-up post.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

8 comments:

Vasudev Ram said...

Just realized the title of the post should really be:

Convert XLSX to PDF with Python, openpyxl and Xtopdf.

(though I do mention openpyxl in the comments in the code, and in the body of the post).



Joel Varma said...

import error PDFWriter
how to solve this

Vasudev Ram said...


You need to be a programmer or at least to know how to install needed libraries used by the program. In this case the one giving the error is PDFWriter. That file (PDFWriter.py) is part of xtopdf. You have to install the xtopdf package before you can run the program (and also other libraries it uses, such as openpyxl). Search for this in Google:

jugad2 guide to installing and using xtopdf

and the first result or so should be the link you want - instructions on how to install xtopdf (on Windows).

The place to get xtopdf is here:

https://bitbucket.org/vasudevram/xtopdf

Anonymous said...

Doesn't work. PDFWriter is not a separate module. falls in pdfrw but doesn't supposrt setfont, header, footer, savepage or close

Vasudev Ram said...


Check your facts.

Vasudev Ram said...


>Doesn't work. PDFWriter is not a separate module. falls in pdfrw but doesn't supposrt setfont, header, footer, savepage or close

In case you didn't get it yet:

Don't know from where you came up with the idea that pdfrw is used. My xtopdf toolkit is used in the post, not pdfrw, and it has a PDFWriter class. And the post clearly says that it uses xtopdf. I even linked the word xtopdf to a google search for xtopdf. A comment on the post also says where to get xtopdf.

Next time onward, read posts and comments fully before commenting.

Bogdan Chakis said...

Hello, I do not understand if formulas are allowed to be converted. I mean if I try to do something like this: worksheet.cell(row=6, column=1, value='=SUM(A3:A5)')
then after the converting procedure I get nothing at cell with those coordinates. Though in xlsx file in that cell the correct value. I used data_only flag. Please, help me.

Vasudev Ram said...

@Bogdan: I have not attempted to support formulas. Generally I only go for getting text content, not formatting, from data sources, and putting it to PDF. Did the same when I used xlrd, though later versions of do support some cell formatting extraction. The goal of xtopdf is not to support formulas and cell formatting, only conversion of text, with added pagination and headers and footers and page numbers, to PDF output.