Monday, November 23, 2015

Convert XLSX to PDF with Python and xtopdf

By Vasudev Ram


XLSX => PDF

This is a simple application of my xtopdf toolkit, showing how to use it to convert XLSX data, i.e. Microsoft Excel data, to PDF (Portable Document Format). It only converts text data, not the formatting, colors, fonts, etc., that may be present in the Excel file.

For the input, I will use this small Excel file, fruits2.xlsx, which I created. A screenshot of it is below (click to enlarge):


Here is the code for XLSXtoPDF.py:
# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()
And here is a screenshot of the PDF output in fruits2.pdf:

There are some points worth mentioning in connection with conversion of data to and from PDF. I will discuss them in a follow-up post.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

15 comments:

Vasudev Ram said...

Just realized the title of the post should really be:

Convert XLSX to PDF with Python, openpyxl and Xtopdf.

(though I do mention openpyxl in the comments in the code, and in the body of the post).



Unknown said...

import error PDFWriter
how to solve this

Vasudev Ram said...


You need to be a programmer or at least to know how to install needed libraries used by the program. In this case the one giving the error is PDFWriter. That file (PDFWriter.py) is part of xtopdf. You have to install the xtopdf package before you can run the program (and also other libraries it uses, such as openpyxl). Search for this in Google:

jugad2 guide to installing and using xtopdf

and the first result or so should be the link you want - instructions on how to install xtopdf (on Windows).

The place to get xtopdf is here:

https://bitbucket.org/vasudevram/xtopdf

Anonymous said...

Doesn't work. PDFWriter is not a separate module. falls in pdfrw but doesn't supposrt setfont, header, footer, savepage or close

Vasudev Ram said...


Check your facts.

Vasudev Ram said...


>Doesn't work. PDFWriter is not a separate module. falls in pdfrw but doesn't supposrt setfont, header, footer, savepage or close

In case you didn't get it yet:

Don't know from where you came up with the idea that pdfrw is used. My xtopdf toolkit is used in the post, not pdfrw, and it has a PDFWriter class. And the post clearly says that it uses xtopdf. I even linked the word xtopdf to a google search for xtopdf. A comment on the post also says where to get xtopdf.

Next time onward, read posts and comments fully before commenting.

Unknown said...

Hello, I do not understand if formulas are allowed to be converted. I mean if I try to do something like this: worksheet.cell(row=6, column=1, value='=SUM(A3:A5)')
then after the converting procedure I get nothing at cell with those coordinates. Though in xlsx file in that cell the correct value. I used data_only flag. Please, help me.

Vasudev Ram said...

@Bogdan: I have not attempted to support formulas. Generally I only go for getting text content, not formatting, from data sources, and putting it to PDF. Did the same when I used xlrd, though later versions of do support some cell formatting extraction. The goal of xtopdf is not to support formulas and cell formatting, only conversion of text, with added pagination and headers and footers and page numbers, to PDF output.

Unknown said...

line 388
print "i = ", i
^
SyntaxError: Missing parentheses in call to 'print'

i tyied to use PDFWriter but I faced this error..
i don`t know why this kind of error appeared

Vasudev Ram said...

@Unknown: You must be using Python 3. xtopdf does not yet support Python 3, though I have plans to work on that at some time in the near future. The actual work is small, have just not got around to it yet. For now, either use Python 2, or if you are a developer (even if not a Python one), it should be easy for you to convert a local copy of the code to Python 3 so it works. with some googling for info.

You can follow me on Gumroad - https://gumroad.com/vasudevram/follow - if you want to get notified about when the porting of xtopdf to Python 3 is done, or subscribe to my blog - see the blog header for how - I will probably announce it on both those channels.

Forest Ranger said...

Thank you very much for this code. It helps me a lot in the current project I am working on. There's just one hitch, when I execute this code the Excel workbook will always prompt whether to save it or not. I hope somebody can help me on this. Thank you very much in advance

Vasudev Ram said...


Are you looking for training on Python programming, SQL programming and database design, or Unix / Linux architecture, usage, commands and shell scripting?

Visit my training page and check out the courses I offer.

Vasudev Ram said...

I conduct courses on:

- Python programming
- Linux commands & shell scripting
- SQL programming and database design
- PDF report generation using ReportLab and xtopdf

xtopdf is my own product, a Python toolkit for PDF generation from other formats.

Check out my course outlines and testimonials.

More courses will be added over time.

Sign up to be notified of my new courses

Sharan said...

what is s += ' ' * 11?

Vasudev Ram said...

@Sharan:

>what is s += ' ' * 11?

Basic Python stuff: adds a string of 11 spaces to string s.