Thursday, June 27, 2013

Follow up #1 on "regular" functions versus generator functions in Python

By Vasudev Ram

Two blog posts ago, in:

Exploring "regular" functions versus generator functions in Python,

I said that I would describe the points of interest that I found about the two versions, one that used a regular function and the other that used a generator function.

Here are some of those points:

1. The generator feature of the Python language was originally developed to enable programmers to create functions that could generate a series of values, not just a single value.

Note: "a series of values" does not mean just a function that can return multiple values (all at the same time). That could be done, trivially, in Python, like this, before generators were added to the language:
>>> def foo():
...     return 1, 2, 3
...
>>> a = foo()
>>> a
(1, 2, 3)
This is just a function whose return value consists of more than one item. Actually, since the value returned is really a tuple, you can argue that there is only one return value:
>>> a = foo()
>>> type(a)

Even if you do this:
>>> a, b, c = foo()
>>> a
1
>>> b
2
>>> c
3
what is happening is that the function returns a single value, a tuple, and then tuple unpacking is used to assign the items of the returned tuple to a, b, and c.

But generators can "return" (or rather, "yield", using the yield keyword), a series of values over time, on demand.

And that is what the lazy_text_proc.py program does; it yields a series of processed lines, on demand, in the for loop. So far, so good - nothing new that the Python docs don't say.

2. Now, coming to the differences between the two programs:

I initially thought (correctly, as it turned out), that the generator version would use less memory than the non-generator version. That seems to be right, on studying the code of both versions, since the non-generator version builds up a list of processed lines in memory as the input file is read and processed, and only then returns the list to its caller, while the generator version does not build up any list, it only returns each processed line to its caller, in the for loop, on each iteration.

But the non-generator version of this program does not necessarily have to return a list of all the processed lines to its caller (for printing or other further processing). Instead, the code that the caller would use to print or process the lines further, can simply be put in the non-generator function, below the line:
new_line = process_line(line, old_pat, new_pat)
and eliminate the use of the list, ending up with this code:
# Process a text file, calling process_line on each line.
def regular_text_proc(filename, old_pat, new_pat):

    with open(filename) as fp:
        for line in fp:
            new_line = process_line(line, old_pat, new_pat)
            result = process_line_more(new_line) # where process_line_more could just be a print,
            # or could be something else.

With this change, the non-generator version would take about the same amount of memory as the generator version.

So what is the advantage of generators in this case?

None, practically (*), except for the possibly clearer code resulting from the separation of concerns of the reading and processing stages.

The moral of the story seems to be that one should not apply language features blindly, but check whether they really are useful and achieve the desired result, and also whether that result can be achieved more simply.

This is not to say that generators are not useful at all, obviously (**); it is just that the use of generators in this example does not seem to be of much benefit (if compared to the modified non-generator version).

(*) I think I would still use the generator version in this case, due to the benefit of separation of concerns - the code just seems somewhat cleaner / easier to understand and maintain in this way.

(**) For example, here is an example where the use of generators may improve performance while still simplifying the code:

Use generators for fetching large db record sets (Python recipe). I had seen an example similar to this in the 2nd Edition of the Python Cookbook by O'Reilly Media, but don't have the book handy right now. This example seems to be roughly the same as that cookbook one, though, IIRC.

I have only a basic understanding of generators myself, and wrote these posts to explore them, hence the title of the first post. Comments are welcome.

- Vasudev Ram - Dancing Bison Enterprises

Contact me

No comments: