Friday, September 28, 2012

Using PipeController to run a pipe incrementally


By Vasudev Ram


In my earlier post about PipeController v0.1 (my experimental tool to simulate pipe-like functionality in Python), I had said that it had some interesting properties (that I did not design up front, but discovered after using it a bit). Here is one of them; it needed some code changes.

The main changes in PipeController v0.2 (download link) are:

- adding support for input and output filenames as arguments to the PipeController class's __init__ method, and also to be able to set those filenames via setter methods;

- two functions, open_for_read() and open_for_write(), to open filenames if given;

- calling of the functions to open files is now done in the run_pipe() method, not the __init__() method; this is a change that enables running pipes incrementally;

- a set_input_source() and a set_output_dest() method; this is also a change that enables running pipes incrementally;

- an add_processors() method to add a list of processors instead of just one as add_processor() did;

And this is the main code fragment (from file test_pipe_controller_03.py) that enables incremental processing (together with the above changes and other misc. stuff):
pc = PipeController(input_source = in_filename)
 processors = [ oto0, eto3, upcase, delspace ]
 ctr = 1

 for processor in processors:
  out_filename = out_filename_prefix + str(ctr).zfill(3)
  tmp_str = "Run #%d: out_filename = %s" % (ctr, out_filename)
  debug(tmp_str)
  pc.set_output_dest(out_filename) 
  pc.add_processor(processor)
  debug("before pc.runpipe()")
  pc.run_pipe()
  ctr += 1
 debug("exiting main")
You can download PipeController v0.2 here.

With these changes, it is possible to run PipeController pipes incrementally.

Refer to the Python functions used in my earlier post, while reading the description below:

By "incrementally", I mean: first run a pipe with only one component, say oto0; then run it with oto0 piped to eto3; then with oto0 piped to eto3 piped to upcase; and so on.

In each run, the output can go to a different file. This can help to debug the pipe's logic: run it incrementally, and check the output generated by each stage.

Another use for this feature, is that it saves the intermediate outputs, each of which may be useful in their own right. This can commonly occur in business (or scientific or other) data-processing situations.

An example with a UNIX analogy: with PipeController v0.2, you can do the Python equivalent of this command sequence:
$ cat it1 | ot0 > ot1-001

$ cat it1 | oto0 | eto3 > ot1-002

$ cat it1 | oto0 | eto3 | upcase > ot1-003

$ cat it1 | oto0 | eto3 | upcase | delspace > ot1-004
(assuming that you have UNIX commands equivalent to the functions oto0, eto3, upcase and delspace mentioned in my previous post; and such commands are trivial to write using commands like tr, etc.)

The above 4 commands result in each incremental run of the pipe (with an additional command tacked on the end each time), generating its output in a different file.

You can do the same sort of thing with PipeController, but with a single program, like this (using the same Python functions and same input file as in the original post about v0.1):
$ python test_pipe_controller_03.py it1 ot1-
The last argument in the above line is actually "ot1-", not a typo; it is a prefix to the output filenames that will be generated by the program. And test_pipe_controller_03.py is a new test program, part of the v0.2 release.

For the input file it1 (same one as in the original post),
$> cat it1
1  some lowercase text
2  more lowercase text
3  even more lowercase text
4  yet more lowercase text
the incremental outputs from the above program run are as follows:
$> cat ot1-001
1  s0me l0wercase text
2  m0re l0wercase text
3  even m0re l0wercase text
4  yet m0re l0wercase text

$> cat ot1-002
1  s0m3 l0w3rcas3 t3xt
2  m0r3 l0w3rcas3 t3xt
3  3v3n m0r3 l0w3rcas3 t3xt
4  y3t m0r3 l0w3rcas3 t3xt

$> cat ot1-003
1  S0M3 L0W3RCAS3 T3XT
2  M0R3 L0W3RCAS3 T3XT
3  3V3N M0R3 L0W3RCAS3 T3XT
4  Y3T M0R3 L0W3RCAS3 T3XT

$> cat ot1-004
1  S0M3L0W3RCAS3T3XT
2  M0R3L0W3RCAS3T3XT
3  3V3NM0R3L0W3RCAS3T3XT
4  Y3TM0R3L0W3RCAS3T3XT
The final output here is the same as in the original example in my previous post, but now you also have the intermediate results generated, for debugging or other uses.

Some notes:

1. I've renamed the file containing the PipeController class, from the earlier name pipes.py (in v0.1) to pipe_controller.py, to avoid a clash with the pipes module in the standard Python library. Also, pipe_controller.py is now a module; i.e., you can now do "from pipe_controller import PipeController" in your own Python program to use the PipeController class; see the file test_pipe_controller_03.py as an example of that.

2. I've also renamed the v0.2 zip file from the earlier name unix-pipes.zip, to pipe_controller-v0.2.zip, because a reader on the comp.lang.python newsgroup, rightly commented that calling it unix-pipes was a bit misleading (though unintentional), since PipeController does not enable IPC between programs, as UNIX pipes do; it only enables a sort of pipelined communication between functions in a single Python program.

- Vasudev Ram - Dancing Bison Enterprises

2 comments:

Anonymous said...

Have you tried https://github.com/julienpalard/pipe ? (easy_install pipe)

Vasudev Ram said...

Yes, I had seen it. Follow recursively (to depth 2 or so) the links starting from my first link in this above post. You'll see that I've mentioned that pipe module in my first post about a year back, along with some other ones too.

They are all different approaches. Choice is good.