jugad2 - Vasudev Ram on software innovation: Python-generators

Showing posts with label Python-generators. Show all posts

Friday, June 1, 2018

Porting a prime checker from Q to Python (with improvements)

By Vasudev Ram

Q => Py

Hi readers,

I was checking out the Q programming language.

It's an interesting language, a descendant of APL, that supports array and functional programming styles. Some information about Q from its Wikipedia page:

Paradigm: Array, functional
Designed by: Arthur Whitney
Developer: Kx Systems
First appeared: 2003
Typing discipline: Dynamic, strong
Influenced by: A+, APL, Scheme, K

Q is a proprietary array processing language developed by Arthur Whitney and commercialized by Kx Systems. The language serves as the query language for kdb+, a disk based and in-memory, column-based database. kdb+ is based upon K, a terse variant of APL. Q is a thin wrapper around K, providing a more readable, English-like interface.

I had tried out Q a bit recently, using one of the tutorials.

Then, while reading the Q Wikipedia page again, I saw this paragraph:

[
When x is an integer greater than 2, the following function will return 1 if it is a prime, otherwise 0:

{min x mod 2_til x}
The function is evaluated from right to left:

"til x" enumerate the positive integers less than x.
"2_" drops the first two elements of the enumeration (0 and 1).
"x mod" performs modulo division between the original integer and each value in the truncated list.
"min" find the minimum value of the list of modulo result.
]

It's basically an algorithm in Q to find whether a given number x (> 2) is prime or not [1]. The algorithm returns 1 if x is a prime, else 0. Notice how concise it is. That conciseness is a property of the APL family of languages, such as APL, J and Q. In fact Q is slightly less concise than some of the others, I've read, because it puts a thin English-like wrapper on the hard-core APL symbolic syntax. But all of them are still very concise.

So I thought of implementing that algorithm in Python, just for fun. I first wrote a naive version, more or less a port of the Q version. Then I rewrote that first version in a more functional way. Then I realized that there are other opportunities for improving the code [2], and implemented a few of them.

So I combined the few different versions of the is_prime_* functions (where * = 1, 2, 3, etc.) that I had written, in a single program, with a driver function to exercise all of them. The code is in file is_prime.py, shown further below. There are comments in the program that explain the logic and improvements or differences of the various is_prime function versions.

[1] Prime_number

[2] There are obviously many other ways of checking if numbers are prime, and many of them are faster than these approaches; see [3]. These are not among the most efficient ways, or even close; I was just experimenting with different ways of rewriting and refactoring the code, after the initial port from Q to Python. Even the original Q version in the Wikipedia page was not meant to be a fast version, since it does not use any of the improvements I mention below, let alone using any special Q-specific algorithm or language feature, or advanced general prime-finding algorithms.

[3] Primality test

There might still be some scope for further changes or improvements, some of which I may do in a future post. A few such improvements are:

1. Don't divide x by all values up to x - 1. Instead, only check up to the square root of x. This is a standard improvement often taught to programming beginners; in fact, it is mentioned in the Wikipedia articles about primes.

2. Terminate early when the first remainder equal to 0 is found. This avoids unnecessarily computing all the other remainders. I did that in is_prime_3().

3. Don't check for divisibility by any even number > 2 (call it b), because if any such b divides x evenly, then so will 2, and we would have checked earlier if 2 divides x evenly.

4. Create a function for the output statements, most of which are common across the different is_prime versions, and call that function instead from those places.

5. Use a generator to lazily yield divisors, and when any divisor gives a zero remainder, exit early, since it means the number is not prime. Done in is_prime_4().

Other ways of doing it may include use of some other functional programming features of Python such as filter(), itertools.takewhile/dropwhile, etc. (The itertools module has many functions and is very interesting, but that is a subject for a different post.)

I also observed some interesting behavior when running the program with large ranges of inputs for prime number checking. Will analyze that a bit and write about my opinions on that in a future post.

Here is the code for is_prime.py:

# File: isprime.py
# A port of a primality checking algorithm from the Q language to Python, 
# plus a few improvements / variations, using Python features.
# Ref: https://en.wikipedia.org/wiki/Q_(programming_language_from_Kx_Systems)
# Search for the word "prime" in that page to see the Q code.

# Author: Vasudev Ram
# Copyright 2018 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
from debug1 import debug1

import sys

# Naive, mostly procedural port from the Q version.
def is_prime_1(x):
    # Guard against invalid argument.
    assert x > 2, "in is_prime_1: x > 2 failed"
    # The range of integers from 0 to x - 1
    til_x = range(x)
    # The range without the first 2 items, 0 and 1.
    divs = til_x[2:]
    # The remainders after dividing x by each integer in divs.
    mods = map(lambda d: x % d, divs)
    # x is prime if the minimum-valued remainder equals 1.
    return min(mods) == 1

# Shorter, more functional version, with nested calls 
# to min, map and range.
def is_prime_2(x):
    assert x > 2, "in is_prime_2: x > 2 failed"
    # Eliminate slicing used in is_prime_1, just do range(2, x).
    return min(map(lambda d: x % d, range(2, x))) == 1

# Early-terminating version, when 1st remainder equal to 0 found, 
# using a list for the range of divisors.
def is_prime_3(x):
    assert x > 2, "in is_prime_3: x > 2 failed"
    divs = range(2, x)
    # Check if x is divisible by any integer in divs; if so, 
    # x is not prime, so terminate early.
    debug1("in is_prime_3, x", x)
    for div in divs:
        debug1("  in loop, div", div)
        if x % div == 0:
            return False
    # If we reach here, x was not divisible by any integer in 
    # 2 to x - 1, so x is prime.
    return True

# Generator function to yield the divisors one at a time, to 
# avoid creating the whole list of divisors up front.
def gen_range(start, x):
    assert start > 0, "in gen_range, start > 0 failed"
    assert x > start, "in gen_range, x > start failed"
    i = start
    while i < x:
        yield i
        i += 1

# Early-terminating version, when 1st remainder equal to 0 found, 
# using a generator for the range of divisors.
def is_prime_4(x):
    assert x > 2, "in is_prime_4, x > 2 failed"
    divs = gen_range(2, x)
    debug1("in is_prime_4, x", x)
    for div in divs:
        debug1("  in loop, div", div)
        if x % div == 0:
            return False
    return True

def check_primes(low, high):
    assert low <= high, "in check_primes, low <= high failed"

    """
    print("\nWith a for loop:")
    for x in range(low, high + 1):
        print(x, "is" if is_prime_1(x) else "is not", "prime,", end=" ")
    print()
    """

    print("\nWith nested function calls:")
    output = [ str(x) + (" prime" if is_prime_2(x) else " not prime") \
        for x in range(low, high + 1) ]
    print(", ".join(output))

    print("\nWith a list of divisors and early termination:")
    output = [ str(x) + (" prime" if is_prime_3(x) else " not prime") \
        for x in range(low, high + 1) ]
    print(", ".join(output))

    print("\nWith a generator of divisors and early termination:")
    output = [ str(x) + (" prime" if is_prime_4(x) else " not prime") \
        for x in range(low, high + 1) ]
    print(", ".join(output))

def main():
    try:
        low = int(sys.argv[1])
        high = int(sys.argv[2])
        if low <= 2:
            print("Error: Low value must be > 2.")
            sys.exit(1)
        if high < low:
            print("Error: High value must be >= low value.")
            sys.exit(1)
        print("Checking primality of integers between {} and {}".format(low, high))
        check_primes(low, high)
        sys.exit(0)
    except ValueError as ve:
        print("Caught ValueError: {}".format(str(ve)))
    except IndexError as ie:
        print("Caught IndexError: {}".format(str(ie)))
    sys.exit(1)

if __name__ == '__main__':
    main()

And here are some runs of the output, below, both for normal and error cases. Note that I used my debug1() debugging utility function in a few places, to show what divisors are being used, in a few places. This helps show that the early termination logic works. To turn off debugging output, simply use the -O option, like this example:

python -O is_prime.py other_args

This improved version of the debug1 function (get it here), unlike the earlier version that was shown in this blog post:

A simple Python debugging function

, does not require the user to set any environment variables like VR_DEBUG, since it uses Python's built-in __debug__ variable instead. So to enable debugging, nothing extra needs to be done, since that variable is set to True by default. To disable debugging, all we have to do is pass the -O option on the python command line.

Here is the prime program's output:

Try this one later (if you're trying out the program), since it takes 
longer to run. You may observe some interesting behavior:

$ python -O is_prime.py 3 10000 | less

where less is the Unix 'less' command, a text pager. Any command-line text pager 
that can read standard input (from a pipe) will work.

Try some of these below (both normal and error cases) before the one above:

Some error-handling cases:

$ python -O is_prime.py 0 0
Error: Low value must be > 2.

$ python -O is_prime.py 2 0
Error: Low value must be > 2.

$ python -O is_prime.py 3 0
Error: High value must be >= low value.

$ python -O is_prime.py 4 2
Error: High value must be >= low value.

Some normal cases:

$ python -O is_prime.py 3 3
Checking primality of integers between 3 and 3

With nested function calls:
3 prime

With a list of divisors and early termination:
3 prime

With a generator of divisors and early termination:
3 prime

To show that the early termination logic works, run the program without 
the -O option. 
Here is one such run. Due to more debugging output, I've only checked 
two numbers, 4 and 5. But you can try with any number of values if you 
page the output, or redirect it to a file.

$ python is_prime.py 4 5
Checking primality of integers between 4 and 5

With nested function calls:
4 not prime, 5 prime

With a list of divisors and early termination:
in is_prime_3, x: 4
  in loop, div: 2
in is_prime_3, x: 5
  in loop, div: 2
  in loop, div: 3
  in loop, div: 4
4 not prime, 5 prime

With a generator of divisors and early termination:
in is_prime_4, x: 4
  in loop, div: 2
in is_prime_4, x: 5
  in loop, div: 2
  in loop, div: 3
  in loop, div: 4
4 not prime, 5 prime

You can see from the above run that for 4, the checking stops early, 
at the first divisor (2), in fact, because it evenly divides 4.
But for 5, all divisors from 2 to 4 are checked, because 5 has 
no prime factors (except itself and 1).

And here is a run checking for primes between 3 and 30:

$ python -O is_prime.py 3 30
Checking primality of integers between 3 and 30

With nested function calls:
3 prime, 4 not prime, 5 prime, 6 not prime, 7 prime, 8 not prime, 9 not prime, 
10 not prime, 11 prime, 12 not prime, 13 prime, 14 not prime, 15 not prime, 
16 not prime, 17 prime, 18 not prime, 19 prime, 20 not prime, 21 not prime, 
22 not prime, 23 prime, 24 not prime, 25 not prime, 26 not prime, 27 not prime, 
28 not prime, 29 prime, 30 not prime

With a list of divisors and early termination:
3 prime, 4 not prime, 5 prime, 6 not prime, 7 prime, 8 not prime, 9 not prime, 
10 not prime, 11 prime, 12 not prime, 13 prime, 14 not prime, 15 not prime, 
16 not prime, 17 prime, 18 not prime, 19 prime, 20 not prime, 21 not prime, 
22 not prime, 23 prime, 24 not prime, 25 not prime, 26 not prime, 27 not prime, 
28 not prime, 29 prime, 30 not prime

With a generator of divisors and early termination:
3 prime, 4 not prime, 5 prime, 6 not prime, 7 prime, 8 not prime, 9 not prime, 
10 not prime, 11 prime, 12 not prime, 13 prime, 14 not prime, 15 not prime, 
16 not prime, 17 prime, 18 not prime, 19 prime, 20 not prime, 21 not prime, 
22 not prime, 23 prime, 24 not prime, 25 not prime, 26 not prime, 27 not prime, 
28 not prime, 29 prime, 30 not prime

You can see that all three functions (is_prime_2 to 4) give the same results. 
(I commented out the call to the naive function version, is_prime_1, after 
a few runs (not shown), so none of these outputs shows its results, but they 
are the same as the others, except for minor formatting differences, due to 
the slightly different output statements used.

I also timed the program for finding primes up to 1000 and 10,000 
(using my own simple command timing program written in Python - not shown).

Command: python -O is_prime.py 3 1000
Time taken: 2.79 seconds
Return code: 0

Command: python -O is_prime.py 3 10000
Time taken: 66.28 seconds
Return code: 0

Related links:

Q (programming_language from Kx_Systems

Kx Systems

Kdb+

Column-oriented DBMS

In-memory database

If you want to try out Q, Kx Systems a free version available for download for non-commercial use, here: Q downloads

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get fast web hosting with A2Hosting.com

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers

Share |

Saturday, March 19, 2016

Python generators are pluggable

By Vasudev Ram

Generator image attribution

While working on a Python project, it crossed my mind that generators could be of use in it. A little research made me realize that generators are pluggable, i.e. they can be passed to functions, and then be used within those functions. This is because generators are a kind of Python object, and any Python object can be passed as an argument to a function.

This in turn is because almost everything in Python is an object (including generators), similar to how almost everything in Unix is a file. Both those concepts can enable some powerful operations.

Here is a program that demonstrates passing generator objects as arguments to another function, and then using those generators inside it:

# Program to show that generators are pluggable, i.e.,
# can be passed as function arguments, and then used 
# inside those functions to which they are passed.
# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2016 Vasudev Ram

def gen_squares(fro, to):
    '''A generator function that returns a generator 
    that returns squares of values in a range.'''
    for val in range(fro, to + 1):
        yield val * val

def gen_cubes(fro, to):
    '''A generator function that returns a generator 
    that returns cubes of values in a range.'''
    for val in range(fro, to + 1):
        yield val * val * val

def use(gen):
    print "In use() function:"
    print "Using:", gen
    print "Items:",
    for item in gen:
        print item,
    print

print "Pluggable Python generators.\n"
print "In main module:"
print "type(use): ", type(use)
print "use:", use
print
print "type(gen_squares): ", type(gen_squares)
print "gen_squares: ", gen_squares
print "type(gen_squares(1, 5)): ", type(gen_squares(1, 5))
print "gen_squares(1, 5): ", gen_squares(1, 5)
print
print "type(gen_cubes): ", type(gen_cubes)
print "gen_cubes: ", gen_cubes
print "type(gen_cubes(1, 5)): ", type(gen_cubes(1, 5))
print "gen_cubes(1, 5): ", gen_cubes(1, 5)
print
for gen_obj in (gen_squares(1, 5), gen_cubes(1, 5)):
    use(gen_obj)
    print

Run the program with:

python pluggable_generators.py

Here is the output:

Pluggable Python generators.

In main module:
type(use):  <type 'function'>
use: <function use at 0x0202C3B0>

type(gen_squares):  <type 'function'>
gen_squares:  <function gen_squares at 0x0207BF30>
type(gen_squares(1, 5)):  <type 'generator'>
gen_squares(1, 5):  <generator object gen_squares at 0x020869B8>

type(gen_cubes):  <type 'function'>
gen_cubes:  <function gen_cubes at 0x0207BFB0>
type(gen_cubes(1, 5)):  <type 'generator'>
gen_cubes(1, 5):  <generator object gen_cubes at 0x020869B8>

In use() function:
Using: <generator object gen_squares at 0x020869B8>
Items: 1 4 9 16 25

In use() function:
Using: <generator object gen_cubes at 0x020869E0>
Items: 1 8 27 64 125

As you can see, I've printed both type(obj) and obj for many of the objects shown, to make it more clear what is going on. Also, a generator function and a generator object (the result of calling a generator function), are two different things, so they are printed separately as well.

A few points about generators and their use:

They can potentially lead to less memory usage, since values are only generated on demand, i.e. evaluation is lazy.

They can help with separation of concerns, a key technique that leads to program modularity; the code for the actual generator functions like gen_squares and gen_cubes does not have to be embedded in the use() function, which makes both the generators and the use() function more reusable.

Someone could say here that we could write gen_squares and gen_cubes as regular functions instead of as generator functions, and then just call them from use(), so their code still does not have to be embedded in the use() function, and that would be right. But in that case, the calls to them would return lists, and if the lists were very large, that would use a lot of memory, and maybe crash or slow down the program. Those issues will not happen with generators, though, because each item is generated just before it is used, and then it is thrown away, not stored. So the memory needed is not proportional to the number of items generated.

Here are some links about Python generators:

Generators - Python Wiki

Stack Overflow - Understanding generators in Python

The image at the top of the post is a Ferranti two-phase AC generator set.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python Posts about xtopdf

My ActiveState recipes

Share |

Thursday, June 27, 2013

Follow up #1 on "regular" functions versus generator functions in Python

By Vasudev Ram

Two blog posts ago, in:

Exploring "regular" functions versus generator functions in Python,

I said that I would describe the points of interest that I found about the two versions, one that used a regular function and the other that used a generator function.

Here are some of those points:

1. The generator feature of the Python language was originally developed to enable programmers to create functions that could generate a series of values, not just a single value.

Note: "a series of values" does not mean just a function that can return multiple values (all at the same time). That could be done, trivially, in Python, like this, before generators were added to the language:

>>> def foo():
...     return 1, 2, 3
...
>>> a = foo()
>>> a
(1, 2, 3)

This is just a function whose return value consists of more than one item. Actually, since the value returned is really a tuple, you can argue that there is only one return value:

>>> a = foo()
>>> type(a)

Even if you do this:

>>> a, b, c = foo()
>>> a
1
>>> b
2
>>> c
3

what is happening is that the function returns a single value, a tuple, and then tuple unpacking is used to assign the items of the returned tuple to a, b, and c.

But generators can "return" (or rather, "yield", using the yield keyword), a series of values over time, on demand.

And that is what the lazy_text_proc.py program does; it yields a series of processed lines, on demand, in the for loop. So far, so good - nothing new that the Python docs don't say.

2. Now, coming to the differences between the two programs:

I initially thought (correctly, as it turned out), that the generator version would use less memory than the non-generator version. That seems to be right, on studying the code of both versions, since the non-generator version builds up a list of processed lines in memory as the input file is read and processed, and only then returns the list to its caller, while the generator version does not build up any list, it only returns each processed line to its caller, in the for loop, on each iteration.

But the non-generator version of this program does not necessarily have to return a list of all the processed lines to its caller (for printing or other further processing). Instead, the code that the caller would use to print or process the lines further, can simply be put in the non-generator function, below the line:

new_line = process_line(line, old_pat, new_pat)

and eliminate the use of the list, ending up with this code:

# Process a text file, calling process_line on each line.
def regular_text_proc(filename, old_pat, new_pat):

    with open(filename) as fp:
        for line in fp:
            new_line = process_line(line, old_pat, new_pat)
            result = process_line_more(new_line) # where process_line_more could just be a print,
            # or could be something else.

With this change, the non-generator version would take about the same amount of memory as the generator version.

So what is the advantage of generators in this case?

None, practically (*), except for the possibly clearer code resulting from the separation of concerns of the reading and processing stages.

The moral of the story seems to be that one should not apply language features blindly, but check whether they really are useful and achieve the desired result, and also whether that result can be achieved more simply.

This is not to say that generators are not useful at all, obviously (**); it is just that the use of generators in this example does not seem to be of much benefit (if compared to the modified non-generator version).

(*) I think I would still use the generator version in this case, due to the benefit of separation of concerns - the code just seems somewhat cleaner / easier to understand and maintain in this way.

(**) For example, here is an example where the use of generators may improve performance while still simplifying the code:

Use generators for fetching large db record sets (Python recipe). I had seen an example similar to this in the 2nd Edition of the Python Cookbook by O'Reilly Media, but don't have the book handy right now. This example seems to be roughly the same as that cookbook one, though, IIRC.

I have only a basic understanding of generators myself, and wrote these posts to explore them, hence the title of the first post. Comments are welcome.

- Vasudev Ram - Dancing Bison Enterprises

Contact me

Share |

jugad2 - Vasudev Ram on software innovation

Pages

Friday, June 1, 2018

Porting a prime checker from Q to Python (with improvements)

Saturday, March 19, 2016

Python generators are pluggable

Thursday, June 27, 2013

Follow up #1 on "regular" functions versus generator functions in Python

Blog Archive

Labels