Sunday, July 24, 2016

Control break report to PDF with xtopdf

By Vasudev Ram

Hi readers,

Control break reports are very common in data processing, from the earliest days of computing until today. This is because they are a fundamental kind of report, the need for which is ubiquitous across many kinds of organizations.

Here is an example program that generates a control break report and writes it to PDF, using xtopdf, my Python toolkit for PDF creation.

The program is named ControlBreakToPDF.py. It uses xtopdf to generate the PDF output, and the groupby function from the itertools module to handle the control break logic easily.

I've written multiple control-break report generation programs before, including implementing the logic manually, and it can get a little fiddly to get everything just right, particularly when there is more than one level of nesting (i.e. no off-by-one errors, etc.); you have to check for various conditions, set flags, etc.

So it's nice to have Python's itertools.groupby functionality handle it, at least for basic cases. Note that the data needs to be sorted on the grouping key, in order for groupby to work. Here is the code for ControlBreakToPDF.py:
from __future__ import print_function

# ControlBreakToPDF.py
# A program to show how to write simple control break reports
# and send the output to PDF, using itertools.groupby and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# http://jugad2.blogspot.com
# https://gumroad.com/vasudevram

from itertools import groupby
from PDFWriter import PDFWriter

# I hard-code the data here to make the example shorter.
# More commonly, it would be fetched at run-time from a 
# database query or CSV file or similar source.

data = \
[
    ['North', 'Desktop #1', 1000],
    ['South', 'Desktop #3', 1100],
    ['North', 'Laptop #7', 1200],
    ['South', 'Keyboard #4', 200],
    ['North', 'Mouse #2', 50],
    ['East', 'Tablet #5', 200],
    ['West', 'Hard disk #8', 500],
    ['West', 'CD-ROM #6', 150],
    ['South', 'DVD Drive', 150],
    ['East', 'Offline UPS', 250],
]

pw = PDFWriter('SalesReport.pdf')
pw.setFont('Courier', 12)
pw.setHeader('Sales by Region')
pw.setFooter('Using itertools.groupby and xtopdf')

# Convenience function to both print to screen and write to PDF.
def print_and_write(s, pw):
    print(s)
    pw.writeLine(s)

# Set column headers.
headers = ['Region', 'Item', 'Sale Value']
# Set column widths.
widths = [ 10, 15, 10 ]
# Build header string for report.
header_str = ''.join([hdr.center(widths[ind]) \
    for ind, hdr in enumerate(headers)])
print_and_write(header_str, pw)

# Function to base the sorting and grouping on.
def key_func(rec):
    return rec[0]

data.sort(key=key_func)

for region, group in groupby(data, key=key_func):
    print_and_write('', pw)
    # Write group header, i.e. region name.
    print_and_write(region.center(widths[0]), pw)
    # Write group's rows, i.e. sales data for the region.
    for row in group:
        # Build formatted row string.
        row_str = ''.join(str(col).rjust(widths[ind + 1]) \
            for ind, col in enumerate(row[1:]))
        print_and_write(' ' * widths[0] + row_str, pw)
pw.close()
Running it gives this output on the screen:
$ python ControlBreakToPDF.py
  Region        Item     Sale Value

   East
                Tablet #5       200
              Offline UPS       250

  North
               Desktop #1      1000
                Laptop #7      1200
                 Mouse #2        50

  South
               Desktop #3      1100
              Keyboard #4       200
                DVD Drive       150

   West
             Hard disk #8       500
                CD-ROM #6       150

$
And this is a screenshot of the PDF output, viewed in Foxit PDF Reader:


So the itertools.groupby function basically provides roughly the same sort of functionality that SQL's GROUP BY clause provides (of course, when included in a complete SELECT statement). The difference is that with Python's groupby, you do the grouping and related processing in your program code, on data which is in memory, while if using SQL via a client-server RDBMS from your program, the grouping and processing will happen on the database server and only the aggregate results will be sent to your program to process further. Both methods can have pros and cons, depending on the needs of the application.

In my next post about Python, I'll use this program as one vehicle to demonstrate some uses of randomness in testing, continuing the series titled "The many uses of randomness", the earlier two parts of which are here and here.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Follow me on Gumroad to be notified about my new products:




My Python posts     Subscribe to my blog by email

My ActiveState recipes



Thursday, July 21, 2016

Testing Gumroad product purchase code embed - vi quickstart tutorial

By Vasudev Ram

Gumroad product purchase embed below. Click vi tutorial link below to buy it.


A vi quickstart tutorial

- Vasudev Ram - Online Python training and consulting

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Friday, July 15, 2016

Test post - ignore.

By Vasudev Ram

Test - ignore.
- Vasudev Ram - Online Python training and consulting
Follow me on Gumroad to be notified about my new products:




My Python posts     Subscribe to my blog by email

My ActiveState recipes



Monday, July 11, 2016

The many uses of randomness - Part 2

By Vasudev Ram


Denarius image attribution

Hi, readers,

In my previous (and first) post on randomness, titled:

The many uses of randomness ,

I had described some uses of random numbers related to floats, and ended by saying I would continue in the next post, with other uses, such as for strings (and things).

This is that next post (delayed some, sorry about that).

Assume that the following statements have been executed first in your Python program or in your Python shell:
from __future__ import print_function
import string
from random import random, randint, randrange, choice, shuffle
Let's look now at the use of random numbers to generate random character and string data.

First, let's generate a few different kinds of random characters:

1) Random characters from the range of 7-bit ASCII characters, i.e. the characters with ASCII codes 0 to 127. This expression generates a single ASCII character:
chr(randint(0, 127))
Each time the above expression is evaluated, it will generate a random character whose code is between 0 and 127.

As a result, it may sometimes generate non-printable characters, such as the characters with codes in the range 0 to 31, and 127. See the Wikipedia article about ASCII above, for information on printable versus non-printable characters.

To generate only printable ASCII characters, use:
choice(string.printable)

We may want to generate all ASCII characters, or even all printable characters, only for some specialized purposes. More commonly, we may want to generate printable random characters from a specific subset of the complete ASCII character set. Some examples of this would be: generating random uppercase letters, random lowercase letters, random numeric digits, or combinations of those. Here are a few code snippets for those cases:
# Generate random uppercase letter.
chr(randint(ord('A'), ord('Z')))
(which relies on the fact that the ASCII codes for the characters 'A' through 'Z' are contiguous).
Or, another way:
# Generate random uppercase letter.
choice(string.ascii_uppercase)
# -------------------------------------------
# Generate random lowercase letter.
chr(randint(ord('a'), ord('z')))
Or, another way:
# Generate random lowercase letter.
choice(string.ascii_lowercase)
Random numbers can be used to generate random strings, where the randomness of the strings can be in either or both of two dimensions, the content or the length:

Generate strings with random character content but fixed length, e.g.: "tdczs", "ohybi", "qhmyf", "elazk"
def rand_lcase_str(n):
    '''Return string of n random lowercase letters.'''
    assert n > 0
    rand_chars = [ choice(string.ascii_lowercase) for i in range(n) ]
    return ''.join(rand_chars)

# Calls and output:
[ rand_lcase_str(3) for i in range(1, 8) ]
['xio', 'qsc', 'omt', 'fnn', 'ezz', 'get', 'frs']
[ rand_lcase_str(7) for i in range(1, 4) ]
['hazrdwu', 'sfvvxno', 'djmhxri']

Generate strings with fixed character content but random lengths, e.g.: "g", "gggg", "gg", "ggggg", "ggg"; all strings contain only letter g's, but are of different lengths.
def rand_len_fixed_char_str(c, low_len=1, high_len=256):
    '''Return a string containing a number of characters c,
    varying randomly in length between low_len and high_len'''
    assert len(c) == 1
    assert 0 < low_len <= high_len
    rand_chars = c * randint(low_len, high_len)
    return rand_chars

# Calls and output:
[ rand_len_fixed_char_str('g', 3, 8) for i in range(10) ]
['gggg',
 'ggggggg',
 'ggg',
 'ggggggg',
 'ggggg',
 'ggggg',
 'gggggg',
 'gggggg',
 'gggggg',
 'ggggg']
Generate strings with both random character content and random lengths, e.g.: "phze", "ysqhdty", "mltstwdg", "bnr", "q", "ifgcvgrey". This should be easy after the above snippets, since we can use parts of the logic from some of them, so is left as an exercise for the reader.

Such kinds of randomly generated data are useful for many purposes, e.g. for testing apps that read or write CSV or TSV files, fixed-length or variable-length records, spreadsheets, databases; for testing report generation logic (particularly with respect to text formatting, wrapping, centering, justification, logic related to column and line widths, etc.).

All these use cases can benefit from running them on random data (maybe with some programmed constraints, as I showed above), to more thoroughly test the app than can be done manually by typing in, say, only a few dozen variations of test data. There are at least two benefits here:

- a program can be more systematically random (if that makes sense) than a human can, thereby giving test data that provides better coverage;

- the computer can generate large volumes of random data for testing the app, much faster than a human can. It can also feed it as input to the software you want to test, faster than a human can, e.g. by reading it from a file instead of a user typing it. So overall, (parts of) your testing work can get done a lot faster.

In the next part, I'll show how, using a mathematical concept, random numbers can be used to reduce the amount of test data needed to test some apps, while still maintaining a good level of quality of testing. I will also discuss / show some other uses of randomness, such as in web development, and simulating physical events.

The image at the top of the post is of a Roman denarius in silver of Maximinus (235-238). The word denarius seems to be the origin of the word for money in multiple modern languages, according to the linked article.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Sunday, June 26, 2016

A Pythonic ratio for pi (or, Py for pi :)

By Vasudev Ram

Py


Pi image attribution

A Python built-in method can be used to find a ratio (of two integers) that equals the mathematical constant pi. [1]

This is how:
from __future__ import print_function
Doing:
dir(0.0) # or dir(float)
gives (some lines truncated):
'__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', 
'as_integer_ratio', 'conjugate', 'fromhex', 'hex', 'imag', 'is_integer', 'real']
>>>
from which we see that as_integer_ratio is a method of float objects. (Floats are objects, so they can have methods.) So:
>>> import math

>>> tup = math.pi.as_integer_ratio()
>>> tup
(884279719003555, 281474976710656)

>>> tup[0] / tup[1]
3.141592653589793

>>> print(sys.version)
3.6.0a2 (v3.6.0a2:378893423552, Jun 14 2016, 01:21:40) [MSC v.1900 64 bit (AMD64
)]
>>>
I was using Python 3.6 above. If you do this in Python 2.7, the "/" causes integer division (when used with integers). So you have to multiply by a float to cause float division to happen:
>>> print(sys.version)
2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)]

>>> tup[0] / tup[1]
3L
>>> 1.0 * tup[0] / tup[1]
3.141592653589793
>>>
[1] There are many Wikipedia topics related to pi.
Also check out a few of my earlier math-related posts (including the one titled "Bhaskaracharya and the man who found zero" :)

The second post in the series on the uses of randomness will be posted in a couple of days - sorry for the delay.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes