Tuesday, October 16, 2012

Swapping pipe components at runtime with pipe_controller


By Vasudev Ram

In my previous post on pipe_controller, Using PipeController to run a pipe incrementally, I mentioned that it had some interesting properties. That post gave an example of one such property: the ability to run a pipe incrementally under program control, with successive outputs going to different files on each incremental run.

This post talks about another pipe_controller property that I discovered by experimentation: you can swap the order of components in a pipeline at runtime, programmatically. That is, you can do something like (using UNIX syntax, though pipe_controller is in Python and works differently):

foo | bar | baz # with output going to file 1

then swap the positions of foo and baz, then run the pipe again:

baz | bar | foo # with output going to file 2

and so on - any number of times, all in the same program run.

This feature lets you experiment with, and validate, your pipeline logic, to make sure that it does what you intended, e.g. you can check the output both before and after swapping components of the pipe, to decide which order you really need - or which order is optimal - see next paragraph.

The feature can also be used to time the execution of two or more different versions of the pipeline (with/without swapping of components), to see which runs faster, in cases where changing the order of those components makes no difference to the output, e.g. if those two components are commutative, in the mathematical sense (like a + b = b + a).

Obvious caveat: a timing test will only show you whether version A or B is faster for the given input, not for other inputs. But after studying the results of a few tests, you may be able to use logic or induction to figure out a rule (about the relative speeds) that applies to most or all of the data.

To enable the feature, I added this method, swap_processors(), to the PipeController class (in file pipe_controller.py):
def swap_processors(self, processor1, processor2):
  """
  PipeController method.
  It lets the caller swap the positions of two 
  processors in the list.
  """
  debug("entered PipeController.swap_processors")
  pos1 = find_element(self._processors, processor1)
  pos2 = find_element(self._processors, processor2)
  if (pos1 == -1) or (pos2 == -1):
   # Either or both processors not found, exit.
   sys.stderr.write("Error: processor1 or 2 not found in list\n")
   sys.exit(1)
  else:
   # Found both, swap their positions.
   self._processors[pos1], self._processors[pos2] = \
    self._processors[pos2], self._processors[pos1] 
  debug("exiting PipeController.swap_processors")
and which uses this function, find_element():
# Find index of given element in list lis.
# Return index (>=0) if found, else -1.

def find_element(lis, element):
 try:
  pos = lis.index(element)
 except ValueError:
  pos = -1
 return pos
With these additions, you can run this program, test_pipe_controller_04.py, which demos swapping pipe components at runtime. It uses the same input file, it1 as in the earlier blog about pipe_controller:
$ cat it1
     1  some lowercase text
     2  more lowercase text
     3  even more lowercase text
     4  yet more lowercase text
Run the new test program like this:
$ python test_pipe_controller_04.py it1 ot04-
The last command-line argument, ot04-, ends with a hyphen because it is a prefix for the 3 output files created: ot04-001, ot04-002, and ot04-003.

The test program does these things:

1. Runs the pipe [ oto0, eto3, upcase, delspace ] on the input. The output is:
$ cat ot04-001
1       S0M3L0W3RCAS3T3XT
2       M0R3L0W3RCAS3T3XT
3       3V3NM0R3L0W3RCAS3T3XT
4       Y3TM0R3L0W3RCAS3T3XT

2. Swaps the positions of oto0 and upcase. Then runs the modified pipe [ upcase, eto3, oto0, delspace ] on the same input. The output is:
$ cat ot04-002
1       SOMELOWERCASETEXT
2       MORELOWERCASETEXT
3       EVENMORELOWERCASETEXT
4       YETMORELOWERCASETEXT
Due to the modified pipeline, all lowercase letters gets converted to uppercase first, so the later-run functions eto3 and oto0 now have no effect on the input, but delspace still does.

3. Swaps the current positions of eto3 and upcase. Then runs the modified pipe [ eto3, upcase, oto0, delspace ] on the same input. The output is:
$ cat ot04-003
1       SOM3LOW3RCAS3T3XT
2       MOR3LOW3RCAS3T3XT
3       3V3NMOR3LOW3RCAS3T3XT
4       Y3TMOR3LOW3RCAS3T3XT
This time, due to the pipeline being modified again, all lowercase letters "e" get converted to uppercase, then all letters get converted to uppercase, so the later-run function oto0 now has no effect on the input, but delspace still does.

To reiterate, this ability to swap components at runtime, and re-run the pipe (with output going to a different file each time), allows you to experiment with / validate your pipeline logic, and/or to do performance comparison of different pipeline orderings.

This updated version of pipe_controller is available on Bitbucket, here:

Python pipe_controller module.

- Vasudev Ram - Dancing Bison Enterprises

No comments: