Wednesday, February 3, 2016

Using Python's trace module to understand the flow of programs (from many angles)

By Vasudev Ram


Some time back, I had written this post:

Python's trace module and chained decorators

in which I had briefly described use of the Python standard library's trace module to help us understand and debug Python programs. In that post I showed a small program with 3 chained decorators. The trace module was used to trace the execution of the decorators and the functions that they decorated.

The trace output in that post also showed the difference between function definition and function execution, both of which occur at run time in Python, since it is a dynamic language.

In this post, I'm going to use the trace module in a few other ways.

Here is a small program in which we will use the trace module.

(Note that I said "in which", not "on which", because I am going to invoke the trace module as a library from within this program, and tell it to start tracing from a specific function call, unlike in my previous post (linked above), in which I used the trace module as though it was a program itself, by using the "python -m" option, to trace my own program containing those decorators.)
# This is a program to show some basic usage of the trace module 
# from the Python standard library.
# Author: Vasudev Ram - 
# http://jugad2.blogspot.in/p/about-vasudev-ram.html
# Copyright 2016 Vasudev Ram

def fa():
    fb()

def fb():
    fc()

def fc():
    fd(5)

def fd(n):
    if n <= 1:
        return
    else:
        fd(n - 1)

import trace

tracer = trace.Trace(
    count=0, trace=0, countfuncs=0, countcallers=1, 
)

tracer.run('fa()')
r = tracer.results()
r.write_results(show_missing=True, coverdir=".")
In this program, I have a function fa calling fb which calls fc which calls fd. Function fd also calls itself recursively, with a termination condition so the recursion is not infinite.

Note that the program contains all the three main programming constructs: sequential execution of statements, conditional execution and iteration (via the recursive call in function fd).

From the Python documentation, the trace.Trace() constructor has the following (partial) signature:

class trace.Trace(count=1, trace=1, countfuncs=0, countcallers=0, ...)

where I have used ellipsis (...) to represent the remaining arguments, which I do not pass. (See the docs (linked above, near top of post) for the full signature and the meaning of all the arguments. The ones I use are explained below.)

As for what those arguments do, again, from the docs:

Create an object to trace execution of a single statement or expression. All parameters are optional. count enables counting of line numbers. trace enables line execution tracing. countfuncs enables listing of the functions called during the run. countcallers enables call relationship tracking.

I ran the program simple_prog.py 4 times. Each time I passed a different combination of argument values to trace.Trace() - with only one argument set to 1 each time, and all the others set to 0 in that run. And each time I redirected the resulting trace output to an output file, except for the first run, in which Python created the output in the file simple_prog.cover (since the program being traced is named simple_prog.py.

The lines below show the sets of arguments passed, and the corresponding output file names for the trace output. (I will show the contents of each output file later, below.)

arg set 1: count=1, trace=0, countfuncs=0, countcallers=0, output: simple_prog.cover

arg set 2: count=0, trace=1, countfuncs=0, countcallers=0, output: run_2.txt

arg set 3: count=0, trace=0, countfuncs=1, countcallers=0, output: run_3.txt

arg set 4: count=0, trace=0, countfuncs=0, countcallers=1, output: run_4.txt

Notice that in each of the arg sets, only one flag is set.

I ran the program this time with just:

python simple_prog.py

since the tracing is started from within the program, unlike in my previous post (linked above) in which I started the tracing like this:

python -m trace -t test_chained_decorators.py

Here is the output for the 1st run, simple_prog.cover:

# This is a program to show some basic usage of the trace module 
       # from the Python standard library.
       # Author: Vasudev Ram - 
       # http://jugad2.blogspot.in/p/about-vasudev-ram.html
       # Copyright 2016 Vasudev Ram
       
>>>>>> def fa():
    1:     fb()
       
>>>>>> def fb():
    1:     fc()
       
>>>>>> def fc():
    1:     fd(5)
       
>>>>>> def fd(n):
    5:     if n <= 1:
    1:         return
           else:
    4:         fd(n - 1)
       
>>>>>> import trace
       
>>>>>> tracer = trace.Trace(
>>>>>>     count=1, trace=0, countfuncs=0, countcallers=0, 
       )
       
>>>>>> tracer.run('fa()') # This line starts the tracing process.
>>>>>> r = tracer.results()
>>>>>> r.write_results(show_missing=True, coverdir=".")
You can see that the output includes the counts of the number of times lines of code are called. Notice that there are no line counts for the def lines, presumably because function definitions are only supposed to be done once (by definition, ha ha), so it would not make sense to show it, and might even be confusing.

Here is the output for the 2nd run, run_2.txt:

--- modulename: simple_prog, funcname: <module>
<string>(1):   --- modulename: simple_prog, funcname: fa
simple_prog.py(6):     fb()
 --- modulename: simple_prog, funcname: fb
simple_prog.py(9):     fc()
 --- modulename: simple_prog, funcname: fc
simple_prog.py(12):     fd(5)
 --- modulename: simple_prog, funcname: fd
simple_prog.py(15):     if n <= 1:
simple_prog.py(18):         fd(n - 1)
 --- modulename: simple_prog, funcname: fd
simple_prog.py(15):     if n <= 1:
simple_prog.py(18):         fd(n - 1)
 --- modulename: simple_prog, funcname: fd
simple_prog.py(15):     if n <= 1:
simple_prog.py(18):         fd(n - 1)
 --- modulename: simple_prog, funcname: fd
simple_prog.py(15):     if n <= 1:
simple_prog.py(18):         fd(n - 1)
 --- modulename: simple_prog, funcname: fd
simple_prog.py(15):     if n <= 1:
simple_prog.py(16):         return
 --- modulename: trace, funcname: _unsettrace
trace.py(80):         sys.settrace(None)
Since the trace flag is set, we get line execution tracing. And due to that, there are 5 entries for function fd, since there are 5 calls to it (of which 4 are recursive).

Here is the output for the 3rd run, run_3.txt:

functions called:
filename: <string>, modulename: <string>, funcname: <module>
filename: D:\Anaconda-2.1.0-64\lib\trace.py, modulename: trace, funcname: _unsettrace
filename: simple_prog.py, modulename: simple_prog, funcname: fa
filename: simple_prog.py, modulename: simple_prog, funcname: fb
filename: simple_prog.py, modulename: simple_prog, funcname: fc
filename: simple_prog.py, modulename: simple_prog, funcname: fd
Since the countfuncs flag is set, it shows the functions called during the run of the program.

Here is the output for the 4th run, run_4.txt:

calling relationships:

*** <string> ***
  --> simple_prog.py
    <string>.<module> -> simple_prog.fa

*** D:\Anaconda-2.1.0-64\lib\trace.py ***
  --> <string>
    trace.Trace.runctx -> <string>.<module>
    trace.Trace.runctx -> trace._unsettrace

*** simple_prog.py ***
    simple_prog.fa -> simple_prog.fb
    simple_prog.fb -> simple_prog.fc
    simple_prog.fc -> simple_prog.fd
    simple_prog.fd -> simple_prog.fd
Since the countcallers flag is set, it shows the function call tree during the run of the program, i.e. what function called what other function(s). The last line shows the recursive call to fd.

So we can see, overall, that those four flags to trace.Trace(), allowed us to get insight into the behaviour of the program, from various angles or perspectives. This makes the trace module a useful tool for debugging and for understanding code that we have to work on.

- Enjoy.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

Sunday, January 24, 2016

RIP Ed Yourdon


https://en.wikipedia.org/wiki/Edward_Yourdon

http://yourdon.com/about/

" According to the December 1999 issue of Crosstalk: The Journal of Defense Software Engineering, Ed Yourdon is one of the ten most influential men and women in the software field."

He had a lot of influence on the Indian software industry too, from its early days.

Tuesday, January 19, 2016

Simple drawing program with Python turtle graphics

By Vasudev Ram


Image attribution: Claudio Giovenzana www.longwalk.it

Some time back I had written a post with a simple turtle graphics program in Python:

(The history of turtle graphics is of interest, including the early years, when they used actual robots for the turtles, which were controlled by programs). The Wikipedia page in the link above has the details.)

Here is the post I wrote earlier:

Python, meet Turtle

Python has support for turtle graphics in the standard library.

Today I wrote a different kind of turtle graphics program: a simple drawing program that allows you to interactively draw figures using a few keys on the keyboard.

The program is a first version, so is not polished. But it works, as far as I tested it.

You can use the following keys to control the turtle and draw:

- up arrow to move forward
- down arrow to move backward
- left arrow to turn left 45 degrees
- right arrow to turn right 45 degrees
- h or H key to move the turtle to its home position (center of screen, heading east)
- c or C key to clear the screen
- q or Q to quit the program

Note: If you do Clear and then Home, and if your turtle had been away from the home position when you did the Clear, the screen will get cleared, but then the Home action will draw a line from where the turtle just was to its (new) home position. If that happens, just press C again to clear the screen of that line.
The better way is to do Home first and then Clear.

Here is the code for the program:
# Program to do drawing using Python turtle graphics.
# turtle_drawing.py v0.1
# Author: Vasudev Ram
# http://jugad2.blogspot.in/p/about-vasudev-ram.html
# Copyright (C) 2016 Vasudev Ram.

import turtle

# Create and set up screen and turtle.

t = turtle
# May need to tweak dimensions below for your screen.
t.setup(600, 600)
t.Screen()
t.title("Turtle Drawing Program - by Vasudev Ram")
t.showturtle()

# Set movement step and turning angle.
step = 160
angle = 45

def forward():
    '''Move forward step positions.'''
    print "forward", step
    t.forward(step)

def back():
    '''Move back step positions.'''
    print "back", step
    t.back(step)

def left():
    '''Turn left by angle degrees.'''
    print "left", angle
    t.left(angle)

def right():
    '''Turn right by angle degrees.'''
    print "right", angle
    t.right(angle)

def home():
    '''Go to turtle home.'''
    print "home"
    t.home()

def clear():
    '''Clear drawing.'''
    print "clear"
    t.clear()

def quit():
    print "quit"
    t.bye()

t.onkey(forward, "Up")
t.onkey(left, "Left")
t.onkey(right, "Right")
t.onkey(back, "Down")
t.onkey(home, "h")
t.onkey(home, "H")
t.onkey(clear, "c")
t.onkey(clear, "C")
t.onkey(quit, "q")
t.onkey(quit, "Q")

t.listen()
t.mainloop()
Support for more turtle actions (see the Python turtle module's docs linked above) can be added, for many of them on the same pattern as the existing functions, without changing the overall structure of the program. Some others may require changes to the design, which is very simple as of now. (I applied the principle Do The Simplest Thing That Could Possibly Work.)

Also, a point to note: many of the capabilities of the turtle module are available in both procedural and object-oriented versions. Here, to keep it simple, I've used the procedural style.

Here are a few screenshots of drawings I created by running and using this program:

A diamond (just a square tilted):


A square:


A figure made out of squares:


The image at the top of the post is of Olive ridley sea turtles.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

Thursday, January 7, 2016

Code for recent post about PDF from a Python pipeline

By Vasudev Ram

In this recent post:

Generate PDF from a Python-controlled Unix pipeline ,

I forgot to include the code for the program PopenToPDF.py. Here it is now:
# PopenToPDF.py
# Demo program to read text from a shell pipeline using 
# subprocess.Popen, and write the text to PDF using xtopdf.
# Author: Vasudev Ram
# Copyright (C) 2016 Vasudev Ram - http://jugad2.blogspot.com

import sys
import subprocess
from PDFWriter import PDFWriter

def error_exit(message):
    sys.stderr.write(message + '\n')
    sys.stderr.write("Terminating.\n")
    sys.exit(1)

def main():
    try:
        # Create and set up a PDFWriter instance.
        pw = PDFWriter("PopenTo.pdf")
        pw.setFont("Courier", 12)
        pw.setHeader("Use subprocess.Popen to read pipe and write to PDF.")
        pw.setFooter("Done using selpg, xtopdf, Python and ReportLab, on Linux.")

        # Set up a pipeline with nl and selpg such that we can read from its stdout.
        # nl numbers the lines of the input.
        # selpg extracts pages 3 to 5 from the input.
        pipe = subprocess.Popen("nl -ba 1000-lines.txt | selpg -s3 -e5", \
            shell=True, bufsize=-1, stdout=subprocess.PIPE, 
            stderr=sys.stderr).stdout

        # Read from the pipeline and write the data to PDF, using the PDFWriter instance.
        for idx, line in enumerate(pipe):
            pw.writeLine(str(idx).zfill(8) + ": " + line)
    except IOError as ioe:
        error_exit("Caught IOError: {}".format(str(ioe)))
    except Exception as e:
        error_exit("Caught Exception: {}".format(str(e)))
    finally:
        pw.close()

main()
I ran it in the usual way with:
$ python PopenToPDF.py
to get the output shown in the previous post describing PopenToPDF.

Also, this is the one-off script, gen-file.py, that created the 1000 line input file:
with open("1000-lines.txt", "w") as fil:
    for i in range(1000):
        fil.write("This is a line of text.\n")
fil.close()

- Vasudev

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

Generate PDF from a Python-controlled Unix pipeline


By Vasudev Ram


This post is about a new xtopdf app I wrote, called PopenToPDF.py.

(xtopdf is my PDF generation toolkit, written in Python. The toolkit consists of a core library and multiple applications built using it.)

This program, PopenToPDF, shows how to use xtopdf to generate PDF output from any Python-controlled Unix pipeline. It uses the subprocess Python module.

I had written a few posts earlier about the uses of StdinToPDF.py, another xtopdf app [1]

(There are many kinds of pipeline"; it is a powerful concept.)

StdinToPDF is an application of xtopdf that can be used at the end of a Unix or Windows pipeline, to publish the text output of the pipeline to PDF.

[1] Here are some of those posts about StdinToPDF:

a) PDFWriter can create PDF from standard input

b) Print selected text pages to PDF with Python, selpg and xtopdf on Linux

c) Generate Windows Task List to PDF with xtopdf

PopenToPDF has the same general goal as StdinToPDF (to allow creation of a pipeline whose final output is PDF), but works somewhat differently.

Instead of just being used passively (like StdinToPDF) as the last component in a pipeline run from the command line, PopenToPDf is a Python program that itself sets up and runs a pipeline (of all the preceding commands, excepting itself), using subprocess.Popen, and then reads the output of that pipeline, programmatically, and converts the text it reads to PDF. So it is a different approach that may allow for other possibilities for customization.

For the example, I created an input text file of 1000 lines, via a small one-off script. The file is called 1000-lines.txt.

The pipeline (created by PopenToPDF) runs "nl -ba" to add sequential line numbers to each line of the input file. (nl is a Unix command to number lines.) Then the output is passed to my selpg utility (a command-line utility in C), which is a filter that reads its input and selects only a specified range of pages to pass on to the output. (Full details of the selpg utility, including explanation of its logic, source code, and the build steps, are at the URL in the previous sentence, or at links accessible from that URL.)

(This page on sites.harvard.edu is a good resource for Linux command line utility development, and also references my IBM dW article about selpg.)

PopenToPDF sets up the above pipeline (nl -ba piped to selpg), and then reads all the lines from it, adds its own line numbers to the input, and writes it all to a PDF file.

Thus we end up with two sets of line numbers prefixed to each line (in the PDF): the original line numbers added by the nl command, which represents the position of each line extracted from the original file, and the serial numbers (starting from 0) for the subset of lines that PopenToPDF sees.

I did this so that we could verify that the pipeline is extracting the right lines that we specified, by looking at the relative and absolute line numbers in the output (screenshots below).

Here is a screenshot of the first page of the PDF output:


And here is a screenshot of the last page, page 4, of the PDF output:


You can see that the last relative line number (added by PopenToPDF, in the extreme left number column) is 215, and the first was 0 (on the first page), so the number of lines extracted by selpg is 216, which corresponds to what we asked selpg for by specifying a start page of 3 (-s3) and an end page of 5 (-e5), since there are 72 lines per page (the default) and 72 * (5 -3 + 1) = 72 * 3 = 216. You can do a similar calculation for the absolute line numbers shown, to verify that we have extracted not only the right number of pages, but also the right pages.

So this approach (using Popen) can be used to run a pipeline under control of a Python program, read the output of the pipeline, and do some further processing on it. Obviously, it is a generic approach, not limited to producing PDF. It could be used for any other purpose where you want to run a pipeline under program control and do something with the output of the pipeline, in your own Python code.

I'll end with a few points about related topics:

This program is actually an example of a category of data processing operations commonly used in organizations, which can be broadly described as starting with some data source, and passing it through a series of transformations until we have the final output we want.

Often, but not always, the input for these transformations is downloaded from some database or application (of the organization), and/or the output is uploaded to another database or application (also of the organization).

In some of these cases, the process is called ETL, meaning Extract, Transform, Load. This operation is also related to IT system integration.

In general, these tasks can consist of a combination of the use of existing components (programs) and purpose-written code in a compiled or interpreted language. The operation can also consist of a combination of manual and automated steps.

When there is enough uniformity in the data and needed processing rules across runs, using more automation leads to more time and cost savings. Some amount of variation in the data or rules can be handled by parameterization of input and output filenames, database connections, table names, use of conditional logic, etc.

Finally, in the process of writing this program and post (across a couple of sessions), I came across mentions of microservices in tech forums. Microservices have been in the news for a while. So I looked up definitions of microservices and realized that they are in some ways similar to Unix pipelines and the Unix philosophy of creating small tools that do one thing well, and then combining them to achieve bigger tasks.

If you're interested in pipes and Python and their intersection, also check out this HN comment by me, which lists multiple other Python pipe-like tools, including one (pipe_controller) by me:

Yes, pyp is interesting. So are some other roughly similar Python tools

- Enjoy.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes