Thursday, December 25, 2014

Create tabular PDF reports with Python, xtopdf and tablib

By Vasudev Ram


Tablib is a Python library that allows you to import, export and manipulate tabular data.

I had come across tablib a while ago. Today I thought of using it with xtopdf, my Python library for PDF creation, to generate PDF output from tabular data. So I wrote a program, TablibToPDF.py, for that. It generates dummy data for student grades (for an examination), then puts that data into a tablib Dataset, and then exports the contents of that Dataset to PDF, using xtopdf. Given the comments in the code, it is mostly self-explanatory. I first wrote the program in an obvious/naive way, and then improved it a little by removing some intermediate variables, and by converting some for loops to list comprehensions, thereby shortening the code by a few lines. Here is the code for TablibToPDF.py:
"""
TablibToPDF.py
Author: Vasudev Ram
Copyright 2014 Vasudev Ram - www.dancingbison.com
This program is a demo of how to use the tablib and xtopdf Python libraries 
to generate tabular data reports as PDF output.
Tablib is at: https://tablib.readthedocs.org/en/latest/
xtopdf is at: https://bitbucket.org/vasudevram/xtopdf
and info about xtopdf is at: http://slides.com/vasudevram/xtopdf or 
at: http://slid.es/vasudevram/xtopdf
"""

import random
import tablib
from PDFWriter import PDFWriter

# Helper function to output a string to both screen and PDF.
def print_and_write(pw, strng):
    print strng
    pw.writeLine(strng)

# Set up grade and result names and mappings.
grade_letters = ['F', 'E', 'D', 'C', 'B', 'A']
results = {'A': 'Pass', 'B': 'Pass', 'C': 'Pass', 
    'D': 'Pass', 'E': 'Pass', 'F': 'Fail'}

# Create an empty Dataset and set its headers.
data = tablib.Dataset()
data.headers = ['ID', 'Name', 'Marks', 'Grade', 'Result']
widths = [5, 12, 8, 8, 12] # Display widths for columns.

# Create some rows of student data and use it to populate the Dataset.
# Columns for each student row correspond to the header columns 
# shown above.

for i in range(20):
    id = str(i).zfill(2)
    name = 'Student-' + id
    # Let's grade them on the curve [1].
    # This examiner doesn't give anyone 100 marks :)
    marks = random.randint(40, 99)
    # Compute grade from marks.
    grade = grade_letters[(marks - 40) / 10]
    result = results[grade]
    columns = [id, name, marks, grade, result]
    row = [ str(col).center(widths[idx]) for idx, col in enumerate(columns) ]
    data.append(row)

# Set up the PDFWriter.
pw = PDFWriter('student_grades.pdf')
pw.setFont('Courier', 10)
pw.setHeader('Student Grades Report - generated by xtopdf')
pw.setFooter('xtopdf: http://slides.com/vasudevram/xtopdf')

# Generate header and data rows as strings; output them to screen and PDF.

separator = '-' * sum(widths)
print_and_write(pw, separator)

# Output headers
header_strs = [ header.center(widths[idx]) for idx, header in enumerate(data.headers) ]
print_and_write(pw, ''.join(header_strs))
print_and_write(pw, separator)

# Output data
for row in data:
    print_and_write(pw, ''.join(row))

print_and_write(pw, separator)
pw.close()

# [1] http://en.wikipedia.org/wiki/Grading_on_a_curve
# I'm not endorsing the idea of grading on a curve; I only used it as a 
# simple algorithm to generate the marks and grades for this example.

You can run it with:
$ python TablibToPDF.py
It sends the tabular output that it generates, to both the screen and to a PDF file named student_grades.pdf.
Here is a screenshot of the generated PDF file, opened in Foxit PDF Reader:


The program that I wrote could actually have been written without using tablib, just with plain Python lists and/or dictionaries. But tablib has some additional features, such as dynamic columns, export to various formats (but not PDF), and more - see its documentation, linked near the top of this post. I may write another blog post later that explores the use of some of those tablib features.

- Enjoy.

Vasudev Ram - Python consulting and training - Dancing Bison Enterprises

Signup to hear about new products or services from me.

Contact Page

5 comments:

Singularity said...

Hello,
I'm a python beginner, I would like to you use your script example, but instead of generating a dataset, I need to take the data from a mysql db. I successfully make the connection, but I miss the piece of code to pass the ouput to tablib. Any ideas?

Vasudev Ram said...

That is straightforward.

You can get the sample / template code for fetching your data from a MySQL db from any tutorial about using Python with MySQL using the Python DB API - google for it. Either the native MySQLdb driver or the pure-Python PyMySQL (which I blogged about earlier) should work.

After making the connection to the db, just create an SQL select statement (as a string) appropriate for your table and columns, fire the query to the db/table using that connection and statement, getting back a cursor, iterate over it with a for statement, and fetch the required columns from the row you get in each iteration.

Put the values of those columns into the tablib dataset, which you should create before the for loop.

The rest of the code (for generating PDF) will then be similar to the code in my post above - only the column names should differ.

Or you could also use SQLAlchemy or another Python ORM, or PyDAL - which I blogged about recently, again in connection with xtopdf.

See http://jugad2.blogspot.com/search/label/python for my previous Python posts.

HTH.

Vasudev Ram said...

But make sure to read up on the basics of databases and SQL, including both standalone and how to use them from Python via the DB API, or an ORM - if you don't already know that - if you're interested in doing well in the long term, as opposed to just solving your current issue. Actually you should know the DB API method even if planning to use and ORM, because it underlies it.

I offer online training on Python and SQL (which can be customized, in some cases). Consider contacting me about that, if interested, via my web site: dancingbison.com/contact.html - use the Gmail address there.

Singularity said...

Thanks for your response.
Actually I've succesfully managed to connect and retrieve data with something like:


# Prepare sql query
query = "SELECT * FROM users"
try:
# Execute the SQL command
cursor.execute(query)
# Fetch all the rows in a list of lists.
res = cursor.fetchall()
for row in res:
id = row[0]
status = row[1]
name = row[2]
surname = row[3]
ip = row[4]
# Now i simply print fetched result
print "id=%s,status=%s,nome=%s,cognome=%s,ip=%s" % \
(id, status, name, surname, ip)
except:
print "Error:check db connection"




Now I need learn more about interacting with tablib instead of just printing the data :-)

Vasudev Ram said...

Great, good to hear you solved it.