jugad2 - Vasudev Ram on software innovation: fmap()

Showing posts with label fmap(). Show all posts

Wednesday, September 14, 2016

Func-y D + Python pipeline to generate PDF

Hi, readers,

Here is a pipeline that generates some output from a D (language) program and passes it to a Python program that converts that output to PDF. The D program makes use of a bit of (simple) template programming / generics and a bit of functional programming (using the std.functional module from Phobos, D's standard library).

D => Python

I'm showing this as yet another example of the uses of xtopdf, my Python toolkit for PDF creation, as well as for the D part, which is fun (pun intended :), and because those D features are powerful.

The D program is derived, with some modifications, from a program in this post by Gary Willoughby,

More hidden treasure in the D standard library .

That program demonstrates, among other things, the pipe feature of D from the std.functional module.

First, the D program, student_grades.d:

/*
student_grades.d
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Adapts code from:
http://nomad.so/2015/08/more-hidden-treasure-in-the-d-standard-library/
*/

import std.stdio;
import std.algorithm;
import std.array;
import std.conv;
import std.functional;

// Set up the functional pipeline by composing some functions.
alias sumString = pipe!(split, map!(to!(int)), sum);

void main(string[] args)
{
    // Data to be transformed:
    // Each string has the student name followed by 
    // their grade in 5 subjects.
    auto student_grades_list = 
    [
        "A 1 2 3 4 5",
        "B 2 3 4 5 6",
        "C 3 4 5 6 7",
        "D 4 5 6 7 8",
        "E 5 6 7 8 9",
    ];

    // Transform the data for each student.
    foreach(student_grades; student_grades_list) {
        auto student = student_grades[0]; // name
        auto total = sumString(student_grades[2..$]); // grade total
        writeln("Grade total for student ", student, ": ", total);
    }
}

The initial data (which the D program transforms) is hard-coded, but that can easily be changed to read it from a text file, for instance. The program uses pipe() to compose [1] the functions split, map (to int), and sum (some of which are generic / template functions).

So, when it is run, each string (student_grades) of the input array student_grades_list is split (into an array of smaller strings, by space as the delimiter); then each string in the array (except for the first, which is the name), is mapped (converted) to integer; then all the integers are summed up to get the student's grade total; finally these names and totals are written to standard output. That becomes the input to the next stage of the pipeline, the Python program, which does the conversion to PDF.

Build the D program with:

dmd -o- student_grades.d

which gives us the executable, student_grades(.exe).

The Python part of the pipeline is StdinToPDF.py, one of the apps in the xtopdf toolkit, which is designed to be used in pipelines such as the above - basically, any Unix, Windows or Mac OS X pipeline, that generates text as its final output, can be further terminated by StdinToPDF, resulting in conversion of that text to PDF. It is a small Python app written using the core class of the xtopdf library, PDFWriter. Here is the original post about StdinToPDF:

[xtopdf] PDFWriter can create PDF from standard input

Here is the pipeline:

$ student_grades | python StdinToPDF.py sg.pdf

Below is a cropped screenshot of the generated output, sg.pdf, as seen in Foxit PDF Reader.

[1] Speaking of functional composition, you may like to check out this earlier post by me:

fmap(), "inverse" of Python map() function

It's about a different way of composing functions (for certain cases). It also has an interesting comment exchange between me and a reader, who showed both how to do it in Scala, and another way to do it in Python. Also, there is another way to compose functions in D, using a function called compose; it works like pipe, but the composed functions are written in the reverse order. It's in the same D module as pipe, std.functional.

Finally (before I pipe down - for today :), check out this HN comment by me (on another topic), in which I mention and link to multiple other ways of doing pipe-like stuff in Python:

Comment on Streem – a new programming language from Matz

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Subscribe to my blog by email

My ActiveState recipes

Share |

Tuesday, October 16, 2012

Swapping pipe components at runtime with pipe_controller

By Vasudev Ram

In my previous post on pipe_controller, Using PipeController to run a pipe incrementally, I mentioned that it had some interesting properties. That post gave an example of one such property: the ability to run a pipe incrementally under program control, with successive outputs going to different files on each incremental run.

This post talks about another pipe_controller property that I discovered by experimentation: you can swap the order of components in a pipeline at runtime, programmatically. That is, you can do something like (using UNIX syntax, though pipe_controller is in Python and works differently):

foo | bar | baz # with output going to file 1

then swap the positions of foo and baz, then run the pipe again:

baz | bar | foo # with output going to file 2

and so on - any number of times, all in the same program run.

This feature lets you experiment with, and validate, your pipeline logic, to make sure that it does what you intended, e.g. you can check the output both before and after swapping components of the pipe, to decide which order you really need - or which order is optimal - see next paragraph.

The feature can also be used to time the execution of two or more different versions of the pipeline (with/without swapping of components), to see which runs faster, in cases where changing the order of those components makes no difference to the output, e.g. if those two components are commutative, in the mathematical sense (like a + b = b + a).

Obvious caveat: a timing test will only show you whether version A or B is faster for the given input, not for other inputs. But after studying the results of a few tests, you may be able to use logic or induction to figure out a rule (about the relative speeds) that applies to most or all of the data.

To enable the feature, I added this method, swap_processors(), to the PipeController class (in file pipe_controller.py):

def swap_processors(self, processor1, processor2):
  """
  PipeController method.
  It lets the caller swap the positions of two 
  processors in the list.
  """
  debug("entered PipeController.swap_processors")
  pos1 = find_element(self._processors, processor1)
  pos2 = find_element(self._processors, processor2)
  if (pos1 == -1) or (pos2 == -1):
   # Either or both processors not found, exit.
   sys.stderr.write("Error: processor1 or 2 not found in list\n")
   sys.exit(1)
  else:
   # Found both, swap their positions.
   self._processors[pos1], self._processors[pos2] = \
    self._processors[pos2], self._processors[pos1] 
  debug("exiting PipeController.swap_processors")

and which uses this function, find_element():

# Find index of given element in list lis.
# Return index (>=0) if found, else -1.

def find_element(lis, element):
 try:
  pos = lis.index(element)
 except ValueError:
  pos = -1
 return pos

With these additions, you can run this program, test_pipe_controller_04.py, which demos swapping pipe components at runtime. It uses the same input file, it1 as in the earlier blog about pipe_controller:

$ cat it1
     1  some lowercase text
     2  more lowercase text
     3  even more lowercase text
     4  yet more lowercase text

Run the new test program like this:

$ python test_pipe_controller_04.py it1 ot04-

The last command-line argument, ot04-, ends with a hyphen because it is a prefix for the 3 output files created: ot04-001, ot04-002, and ot04-003.

The test program does these things:

1. Runs the pipe [ oto0, eto3, upcase, delspace ] on the input. The output is:

$ cat ot04-001
1       S0M3L0W3RCAS3T3XT
2       M0R3L0W3RCAS3T3XT
3       3V3NM0R3L0W3RCAS3T3XT
4       Y3TM0R3L0W3RCAS3T3XT

2. Swaps the positions of oto0 and upcase. Then runs the modified pipe [ upcase, eto3, oto0, delspace ] on the same input. The output is:

$ cat ot04-002
1       SOMELOWERCASETEXT
2       MORELOWERCASETEXT
3       EVENMORELOWERCASETEXT
4       YETMORELOWERCASETEXT

Due to the modified pipeline, all lowercase letters gets converted to uppercase first, so the later-run functions eto3 and oto0 now have no effect on the input, but delspace still does.

3. Swaps the current positions of eto3 and upcase. Then runs the modified pipe [ eto3, upcase, oto0, delspace ] on the same input. The output is:

$ cat ot04-003
1       SOM3LOW3RCAS3T3XT
2       MOR3LOW3RCAS3T3XT
3       3V3NMOR3LOW3RCAS3T3XT
4       Y3TMOR3LOW3RCAS3T3XT

This time, due to the pipeline being modified again, all lowercase letters "e" get converted to uppercase, then all letters get converted to uppercase, so the later-run function oto0 now has no effect on the input, but delspace still does.

To reiterate, this ability to swap components at runtime, and re-run the pipe (with output going to a different file each time), allows you to experiment with / validate your pipeline logic, and/or to do performance comparison of different pipeline orderings.

This updated version of pipe_controller is available on Bitbucket, here:

Python pipe_controller module.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Friday, October 5, 2012

fmap(), "inverse" of Python map() function

By Vasudev Ram

fmap() is a function I created, which is a kind of inverse of the built-in Python map() function. It has probably been discovered/created by many others before (though they may have called it by different names), but I thought of it just now, based on some code I wrote recently (*), so blogging about it.

The comments in the source code below describe what fmap does. It is straightforward.

(*) That "recent code" refers to this snippet:

result = item
for processor in self._processors:
    result = processor(result)

from my blog post about the release of PipeController v0.1; that snippet basically does the same thing as fmap(), but the snippet is specific to PipeController, whereas fmap() was extracted from that, and generalized to be reusable in other programs.

fmap.py source code:

(You can also get the fmap.py code from a pastebin here, since Blogger sometimes doesn't render inline code too well (tabs show up as one space, for example.)

# fmap.py

# Author: Vasudev Ram - http://www.dancingbison.com

# fmap() is a Python function which is a kind of inverse of the 
# built-in Python map() function.
# The map() function is documented in the Python interpreter as
# follows:

"""
>>> print map.__doc__
map(function, sequence[, sequence, ...]) -> list

Return a list of the results of applying the function to the items of
the argument sequence(s).  If more than one sequence is given, the
function is called with an argument list consisting of the corresponding
item of each sequence, substituting None for missing values when not all
sequences have the same length.  If the function is None, return a list of
the items of the sequence (or a list of tuples if more than one sequence).
"""

# The fmap() function does the inverse, in a sense.
# It returns the result of applying a list of functions to a 
# given argument.
# TODO: Later extend the function to also work on a sequence of 
# arguments like map() does.

import string

def fmap(function_list, argument):
 result = argument
 for function in function_list:
  #print "calling " + function.__name__ + "(" + repr(result) + ")"
  result = function(result)
 return result

def times_two(arg):
 return arg * 2

def square(arg):
 return arg * arg

def upcase(s):
 return string.upper(s)

def delspace(s):
 return string.replace(s, ' ', '')

def main():

 print

 function_list = [ times_two, square ]
 for argument in range(5):
  fmap_result = fmap(function_list, argument)
  print "argument:", argument, ": fmap result:", fmap_result

 print

 function_list = [ upcase, delspace ]
 for argument in [ "the quick brown fox", "the lazy dog" ]:
  fmap_result = fmap(function_list, argument)
  print "argument:", argument, ": fmap result:", fmap_result

if __name__ == "__main__":
 main()

# EOF: fmap.py

Output of running a test program for fmap():

$> python fmap.py

argument: 0 : fmap result: 0
argument: 1 : fmap result: 4
argument: 2 : fmap result: 16
argument: 3 : fmap result: 36
argument: 4 : fmap result: 64

argument: the quick brown fox : fmap result: THEQUICKBROWNFOX
argument: the lazy dog : fmap result: THELAZYDOG

UPDATE:

Here are a couple of other interesting posts about functional programming in Python, which I found by doing a Google search for relevant terms:

Dhananjay Nene's two posts on the subject:

Functional Programming With Python - Part 1

Functional Programming With Python – Part 2 - Useful Python Constructs

A StackOverflow thread: Why program functionally in Python?

- Vasudev Ram - Dancing Bison Enterprises

Share |

jugad2 - Vasudev Ram on software innovation

Pages

Wednesday, September 14, 2016

Func-y D + Python pipeline to generate PDF

Tuesday, October 16, 2012

Swapping pipe components at runtime with pipe_controller

Friday, October 5, 2012

fmap(), "inverse" of Python map() function

Blog Archive

Labels