Sunday, July 24, 2016

Control break report to PDF with xtopdf

By Vasudev Ram

Hi readers,

Control break reports are very common in data processing, from the earliest days of computing until today. This is because they are a fundamental kind of report, the need for which is ubiquitous across many kinds of organizations.

Here is an example program that generates a control break report and writes it to PDF, using xtopdf, my Python toolkit for PDF creation.

The program is named It uses xtopdf to generate the PDF output, and the groupby function from the itertools module to handle the control break logic easily.

I've written multiple control-break report generation programs before, including implementing the logic manually, and it can get a little fiddly to get everything just right, particularly when there is more than one level of nesting (i.e. no off-by-one errors, etc.); you have to check for various conditions, set flags, etc.

So it's nice to have Python's itertools.groupby functionality handle it, at least for basic cases. Note that the data needs to be sorted on the grouping key, in order for groupby to work. Here is the code for
from __future__ import print_function

# A program to show how to write simple control break reports
# and send the output to PDF, using itertools.groupby and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram

from itertools import groupby
from PDFWriter import PDFWriter

# I hard-code the data here to make the example shorter.
# More commonly, it would be fetched at run-time from a 
# database query or CSV file or similar source.

data = \
    ['North', 'Desktop #1', 1000],
    ['South', 'Desktop #3', 1100],
    ['North', 'Laptop #7', 1200],
    ['South', 'Keyboard #4', 200],
    ['North', 'Mouse #2', 50],
    ['East', 'Tablet #5', 200],
    ['West', 'Hard disk #8', 500],
    ['West', 'CD-ROM #6', 150],
    ['South', 'DVD Drive', 150],
    ['East', 'Offline UPS', 250],

pw = PDFWriter('SalesReport.pdf')
pw.setFont('Courier', 12)
pw.setHeader('Sales by Region')
pw.setFooter('Using itertools.groupby and xtopdf')

# Convenience function to both print to screen and write to PDF.
def print_and_write(s, pw):

# Set column headers.
headers = ['Region', 'Item', 'Sale Value']
# Set column widths.
widths = [ 10, 15, 10 ]
# Build header string for report.
header_str = ''.join([[ind]) \
    for ind, hdr in enumerate(headers)])
print_and_write(header_str, pw)

# Function to base the sorting and grouping on.
def key_func(rec):
    return rec[0]


for region, group in groupby(data, key=key_func):
    print_and_write('', pw)
    # Write group header, i.e. region name.
    print_and_write([0]), pw)
    # Write group's rows, i.e. sales data for the region.
    for row in group:
        # Build formatted row string.
        row_str = ''.join(str(col).rjust(widths[ind + 1]) \
            for ind, col in enumerate(row[1:]))
        print_and_write(' ' * widths[0] + row_str, pw)
Running it gives this output on the screen:
$ python
  Region        Item     Sale Value

                Tablet #5       200
              Offline UPS       250

               Desktop #1      1000
                Laptop #7      1200
                 Mouse #2        50

               Desktop #3      1100
              Keyboard #4       200
                DVD Drive       150

             Hard disk #8       500
                CD-ROM #6       150

And this is a screenshot of the PDF output, viewed in Foxit PDF Reader:

So the itertools.groupby function basically provides roughly the same sort of functionality that SQL's GROUP BY clause provides (of course, when included in a complete SELECT statement). The difference is that with Python's groupby, you do the grouping and related processing in your program code, on data which is in memory, while if using SQL via a client-server RDBMS from your program, the grouping and processing will happen on the database server and only the aggregate results will be sent to your program to process further. Both methods can have pros and cons, depending on the needs of the application.

In my next post about Python, I'll use this program as one vehicle to demonstrate some uses of randomness in testing, continuing the series titled "The many uses of randomness", the earlier two parts of which are here and here.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Follow me on Gumroad to be notified about my new products:

My Python posts     Subscribe to my blog by email

My ActiveState recipes


Vasudev Ram said...

While googling for links about the control break concept for this post, I came across this Python recipe on ActiveState, which uses an interesting approach, a state machine:

Control break report generation example (Python recipe)

Had not seen that approach before, though, with hindsight, it does seem like a state machine can be appropriate for this, since the program can be thought of as transitioning through several states, such as a) the initial state before any data is read, b) the state after reading the first record, and then c) one of 3 other possible states: c.1) next record read has same key as previous, c.2) next record read has different key from previous (a state change to a different key group - this is what is called a control break, from which the report gets its name), and c.3) no next record (end of data).

Vasudev Ram said...

Referring to the above comment, on further thought, I realized that I had implemented something roughly equivalent to a custom state machine (spread across multiple methods in the PDFWriter class), in the core xtopdf library in (see the xtopdf source code at the Bitbucket link in the post above). That's what takes care of the pagination, printing header and footer at the right time and place (in the PDF), page numbering, resetting font (a quirk required by ReportLab), etc. And also realized that the core xtopdf logic itself is conceptually similar to the logic of a control-break report. So, the above program,, sort of implements a control-break inside a control-break - ha ha.