Showing posts with label python-beginners. Show all posts
Showing posts with label python-beginners. Show all posts

Thursday, April 18, 2019

Python's dynamic nature: sticking an attribute onto an object


- By Vasudev Ram - Online Python training / SQL training / Linux training



Hi, readers,

[This is a beginner-level Python post.]

Python, being a dynamic language, has some interesting features that some static languages may not have (and vice versa too, of course).

One such feature, which I noticed a while ago, is that you can add an attribute to a Python object even after it has been created. (Conditions apply.)

I had used this feature some time ago to work around some implementation issue in a rudimentary RESTful server that I created as a small teaching project. It was based on the BaseHTTPServer module.

Here is a (different) simple example program, stick_attrs_onto_obj.py, that demonstrates this Python feature.
My informal term for this feature is "sticking an attribute onto an object" after the object is created.

Since the program is simple, and there are enough comments in the code, I will not explain it in detail.
# stick_attrs_onto_obj.py

# A program to show:
# 1) that you can "stick" attributes onto a Python object after it is created, and
# 2) one use of this technique, to count the number# of calls to a function.

# Copyright 2019 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Training: https://jugad2.blogspot.com/p/training.html
# Product store: https://gumroad.com/vasudevram
# Twitter: https://twitter.com/vasudevram

from __future__ import print_function

# Define a function.
def foo(arg):
    # Print something to show that the function has been called.
    print("in foo: arg = {}".format(arg))
    # Increment the "stuck-on" int attribute inside the function.
    foo.call_count += 1

# A function is also an object in Python.
# So we can add attributes to it, including after it is defined.
# I call this "sticking" an attribute onto the function object.
# The statement below defines the attribute with an initial value, 
# which is changeable later, as we will see.
foo.call_count = 0

# Print its initial value before any calls to the function.
print("foo.call_count = {}".format(foo.call_count))

# Call the function a few times.
for i in range(5):
    foo(i)

# Print the attribute's value after those calls.
print("foo.call_count = {}".format(foo.call_count))

# Call the function a few more times.
for i in range(3):
    foo(i)

# Print the attribute's value after those additional calls.
print("foo.call_count = {}".format(foo.call_count))

And here is the output of the program:
$ python stick_attrs_onto_obj.py
foo.call_count = 0
in foo: arg = 0
in foo: arg = 1
in foo: arg = 2
in foo: arg = 3
in foo: arg = 4
foo.call_count = 5
in foo: arg = 0
in foo: arg = 1
in foo: arg = 2
foo.call_count = 8

There may be other ways to get the call count of a function, including using a profiler, and maybe by using a closure or decorator or other way. But this way is really simple. And as you can see from the code, it is also possible to use it to find the number of calls to the function, between any two points in the program code. For that, we just have to store the call count in a variable at the first point, and subtract that value from the call count at the second point. In the above program, that would be 8 - 5 = 3, which matches the 3 that is the number of calls to function foo made by the 2nd for loop.

Enjoy.

- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Sell your digital products via DPD: Digital Publishing for Ebooks and Downloads.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:


Monday, April 1, 2019

rmline: Python command-line utility to remove lines from a file [Rosetta Code solution]



- By Vasudev Ram - Online Python training / SQL training / Linux training



Pipeline image attribution

Hi readers,

Long time no post. Sorry.

I saw this programming problem about removing lines from a file on Rosetta Code.

Rosetta Code (Wikipedia) is a programming chrestomathy site.

It's a simple problem, so I thought it would make a good example for Python beginners.

So I wrote a program to solve it. To get the benefits of reuse and composition (at the command line), I wrote it as a Unix-style filter.

Here it is, in file rmline.py:
# Author: Vasudev Ram
# Copyright Vasudev Ram
# Product store:
#    https://gumroad.com/vasudevram
# Training (course outlines and testimonials):
#    https://jugad2.blogspot.com/p/training.html
# Blog:
#    https://jugad2.blogspot.com
# Web site:
#    https://vasudevram.github.io
# Twitter:
#    https://twitter.com/vasudevram

# Problem source:
# https://rosettacode.org/wiki/Remove_lines_from_a_file

from __future__ import print_function
import sys

from error_exit import error_exit

# globals 
sa, lsa = sys.argv, len(sys.argv)

def usage():
    print("Usage: {} start_line num_lines file".format(sa[0]))
    print("Usage: other_command | {} start_line num_lines".format(
    sa[0]))

def main():
    # Check number of args.
    if lsa < 3:
        usage()
        sys.exit(0)

    # Convert number args to ints.
    try:
        start_line = int(sa[1])
        num_lines = int(sa[2])
    except ValueError as ve:
        error_exit("{}: ValueError: {}".format(sa[0], str(ve)))

    # Validate int ranges.
    if start_line < 1:
        error_exit("{}: start_line ({}) must be > 0".format(sa[0], 
        start_line))
    if num_lines < 1:
        error_exit("{}: num_lines ({}) must be > 0".format(sa[0], 
        num_lines))

    # Decide source of input (stdin or file).
    if lsa == 3:
        in_fil = sys.stdin
    else:
        try:
            in_fil = open(sa[3], "r")
        except IOError as ioe:
            error_exit("{}: IOError: {}".format(sa[0], str(ioe)))

    end_line = start_line + num_lines - 1

    # Read input, skip unwanted lines, write others to output.
    for line_num, line in enumerate(in_fil, 1):
        if line_num < start_line:
            sys.stdout.write(line)
        elif line_num > end_line:
            sys.stdout.write(line)

    in_fil.close()

if __name__ == '__main__':
    main()

Here are a few test text files I tried it on:
$ dir f?.txt/b
f0.txt
f5.txt
f20.txt
f0.txt has 0 bytes.
Contents of f5.txt:
$ type f5.txt
line 1
line 2
line 3
line 4
line 5
f20.txt is similar to f5.txt, but with 20 lines.

Here are a few runs of the program, with output:
$ python rmline.py
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines

$ dir | python rmline.py
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines
Both the above runs show that when called with an invalid set of
arguments (none, in this case), it prints a usage message and exits.
$ python rmline.py f0.txt
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines
Same result, except I gave an invalid first (and only) argument, a file name. See the usage() function in the code to know the right order and types of arguments.
$ python rmline.py -3 4 f0.txt
rmline.py: start_line (-3) must be > 0

$ python rmline.py 2 0 f0.txt
rmline.py: num_lines (0) must be > 0
The above two runs shows that it checks for invalid values for the
first two expected integer argyuments, start_line and num_line.
$ python rmline.py 1 2 f0.txt
For an empty input file, as expected, it both removes and prints nothing.
$ python rmline.py 1 2 f5.txt
line 3
line 4
line 5
The above run shows it removing lines 1 through 2 (start_line = 1, num_lines = 2) of the input from the output.
$ python rmline.py 7 4 f5.txt
line 1
line 2
line 3
line 4
line 5
The above run shows that if you give a starting line number larger than the last input line number, it removes no lines of the input.
$ python rmline.py 1 10 f20.txt
line 11
line 12
line 13
line 14
line 15
line 16
line 17
line 18
line 19
line 20
The above run shows it removing the first 10 lines of the input.
$ python rmline.py 6 10 f20.txt
line 1
line 2
line 3
line 4
line 5
line 16
line 17
line 18
line 19
line 20
The above run shows it removing the middle 10 lines of the input.
$ python rmline.py 11 10 f20.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
The above run shows it removing the last 10 lines of the input.

Read more:

Pipeline (computing)

Redirection (computing)

The image at the top of the post is of a Unix-style pipeline, with standard input (stdin), standard output (stdout) and standard error (stderr) streams of programs, all independently redirectable, and with the standard output of a preceding command piped to the standard input of the succeeding command in the pipeline. Pipelines and I/O redirection are one of the powerful features of the Unix operating system and shell.

Read a brief introduction to those concepts in an article I wrote for IBM developerWorks:

Developing a Linux command-line utility

The above link is to a post about that utility on my blog. For the
actual code for the utility (in C), and for the PDF of the article,
follow the relevant links in the post.

I had originally written the utility for production use for one of the
largest motorcycle manufacturers in the world.

Enjoy.


Tuesday, January 22, 2019

Factorial one-liner using reduce and mul for Python 2 and 3


- By Vasudev Ram - Online Python training / SQL training / Linux training

$ foo bar | baz

Hi, readers,

A couple of days ago, I wrote this post for computing factorials using the reduce and operator.mul functions:

Factorial function using Python's reduce function

A bit later I realized that it can be made into a Python one-liner. Here is the one-liner - it works in both Python 2 and Python 3:
$ py -2 -c "from __future__ import print_function; from functools 
import reduce; from operator import mul; print(list(reduce(mul, 
range(1, fact_num + 1)) for fact_num in range(1, 11)))"
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

$ py -3 -c "from __future__ import print_function; from functools 
import reduce; from operator import mul; print(list(reduce(mul, 
range(1, fact_num + 1)) for fact_num in range(1, 11)))"
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

(I've split the commands above across multiple lines to avoid truncation while viewing, but if trying them out, enter each of the above commands on a single line.)

A small but interesting point is that one of the imports is not needed in Python 2, and the other is not needed in Python 3:

- importing print_function is not needed in Py 3, because in 3, print is a function, not a statement - but it is not an error to import it, for compatibility with Py 2 code - where it actually needs to be imported for compatibility with Py 3 code (for using print as a function), ha ha.

- importing reduce is not needed in Py 2, because in 2, reduce is both a built-in and also available in the functools module - and hence it is not an error to import it.

Because of the above two points, the same one-liner works in both Py 2 and Py 3.

Can you think of a similar Python one-liner that gives the same output as the above (and for both Py 2 and 3), but can work without one of the imports above (but by removing the same import for both Py 2 and 3)? If so, type it in a comment on the post.

py is The Python launcher for Windows.

Enjoy.


- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:




Monday, January 21, 2019

Factorial function using Python's reduce function


- By Vasudev Ram - Online Python training / SQL training / Linux training



[This is a beginner-level Python post. I label such posts as "python-beginners" in the Blogger labels at the bottom of the post. You can get a sub-feed of all such posts for any label using the label (case-sensitive) in a URL of the form:

https://jugad2.blogspot.com/search/label/label_name where label_name is to be replaced by an actual label,

such as in:

jugad2.blogspot.com/search/label/python-beginners

and

jugad2.blogspot.com/search/label/python
]

Hi, readers,

The factorial function (Wikipedia article) is often implemented in programming languages as either an iterative or a recursive function. Both are fairly simple to implement.

For the iterative version, to find the value of n factorial (written n! in mathematics), you set a variable called, say, product, equal to 1, then multiply it in a loop by each value of a variable i that ranges from 1 to n.

For the recursive version, you define the base case as 0! = 1, and then for all higher values of n factorial, you compute them recursively as the product of n with (n - 1) factorial.

[ Wikipedia article about Iteration. ]

[ Wikipedia article about Recursion in computer_science. ]

Here is another way of doing it, which is also iterative, but uses no explicit loop; instead it uses Python's built-in reduce() function, which is part of the functional programming paradigm or style:
In [179]: for fact_num in range(1, 11):
     ...:     print reduce(mul, range(1, fact_num + 1))
     ...:
1
2
6
24
120
720
5040
40320
362880
3628800
The above snippet (run in IPython - command-line version), loops over the values 1 to 10, and computes the factorial of each of those values, using reduce with operator.mul (which is a functional version of the multiplication operator). In more detail: the function call range(1, 11) returns a list with the values 1 to 10, and the for statement iterates over those values, passing each to the expression involving reduce and mul, which together compute each value's factorial, using the iterable returned by the second range call, which produces all the numbers that have to be multiplied together to get the factorial of fact_num.

The Python docstring for reduce:
reduce.__doc__: reduce(function, sequence[, initial]) -> value

Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Did you know that there are many different kinds of factorials? To learn more, check out this post:

Permutation facts

- Enjoy.


- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Sell More Digital Products With SendOwl.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:


Thursday, January 3, 2019

Multiple item search in an unsorted list in Python


- By Vasudev Ram - Online Python training / SQL training / Linux training



Hi, readers,

I was reviewing simple algorithms with a view to using some as examples or exercises in my Python programming course. While doing so, I thought of enhancing simple linear search for one item in a list, to make it search for multiple items.

Here are a couple of program versions I wrote for that task. They use straightforward logic. There are just a few additional points:

- In both programs, I use a generator to yield the values found (the index and the item).
- In the first program, I print out the index and item for each item found.
- In the second program, I mark where the items are found with text "arrows".

This is the first program, mult_item_search_unsorted_list.py:
# mult_item_search_unsorted_list.py 
# Purpose: To search for multiple items in an unsorted list.
# Prints each item found and its index.
# Author: Vasudev Ram
# Copyright 2019 Vasudev Ram
# Training: https://jugad2.blogspot.com/p/training.html
# Blog: https://jugad2.blogspot.com
# Web site: https://vasudevram.github.io
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
import sys
from random import sample, shuffle

def mult_item_search_unsorted_list(dlist, slist):
    for didx, ditem in enumerate(dlist):
        for sitem in slist:
            if sitem == ditem:
                yield (didx, ditem)

def main():
    # Create the search list (slist) with some items that will be found 
    # and some that will not be found in the data list (dlist) below.
    slist = sample(range(0, 10), 3) + sample(range(10, 20), 3)
    # Create the data list.
    dlist = range(10)
    for i in range(3):
        # Mix it up, DJ.
        shuffle(slist)
        # MIX it up, DEK.
        shuffle(dlist)
        print("\nSearching for:", slist)
        print("    in:", dlist)
        for didx, ditem in mult_item_search_unsorted_list(dlist, slist):
            print("        found {} at index {}".format(ditem, didx))
    
main()
Output of a run:
$ python mult_item_search_unsorted_list.py

Searching for: [1, 18, 3, 15, 19, 4]
    in: [8, 9, 1, 2, 0, 7, 5, 3, 6, 4]
        found 1 at index 2
        found 3 at index 7
        found 4 at index 9

Searching for: [4, 19, 18, 15, 1, 3]
    in: [7, 5, 8, 2, 9, 4, 0, 3, 6, 1]
        found 4 at index 5
        found 3 at index 7
        found 1 at index 9

Searching for: [1, 3, 4, 18, 19, 15]
    in: [9, 6, 1, 8, 7, 4, 3, 0, 2, 5]
        found 1 at index 2
        found 4 at index 5
        found 3 at index 6
And this is the second program, mult_item_search_unsorted_list_w_arrows.py:
# mult_item_search_unsorted_list_w_arrows.py 
# Purpose: To search for multiple items in an unsorted list.
# Marks the position of the items found with arrows.
# Author: Vasudev Ram
# Copyright 2019 Vasudev Ram
# Training: https://jugad2.blogspot.com/p/training.html
# Blog: https://jugad2.blogspot.com
# Web site: https://vasudevram.github.io
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
import sys
from random import sample, shuffle

def mult_item_search_unsorted_list(dlist, slist):
    for didx, ditem in enumerate(dlist):
        for sitem in slist:
            if sitem == ditem:
                yield (didx, ditem)

def main():
    # Create the search list (slist) with some items that will be found 
    # and some that will not be found in the data list (dlist) below.
    slist = sample(range(10), 4) + sample(range(10, 20), 4)
    # Create the data list.
    dlist = range(10)
    for i in range(3):
        # Mix it up, DJ.
        shuffle(slist)
        # MIX it up, DEK.
        shuffle(dlist)
        print("\nSearching for: {}".format(slist))
        print("    in: {}".format(dlist))
        for didx, ditem in mult_item_search_unsorted_list(dlist, slist):
            print("---------{}^".format('---' * didx))
    
main()
Output of a run:
$ python mult_item_search_unsorted_list_w_arrows.py

Searching for: [16, 0, 15, 4, 6, 1, 10, 12]
    in: [8, 9, 0, 1, 5, 4, 7, 2, 6, 3]
---------------^
------------------^
------------------------^
---------------------------------^

Searching for: [6, 16, 10, 0, 1, 4, 12, 15]
    in: [2, 7, 0, 8, 1, 4, 6, 3, 9, 5]
---------------^
---------------------^
------------------------^
---------------------------^

Searching for: [0, 12, 4, 10, 6, 16, 1, 15]
    in: [8, 1, 0, 7, 9, 6, 2, 5, 4, 3]
------------^
---------------^
------------------------^
---------------------------------^

In a recent post, Dynamic function creation at run time with Python's eval built-in, I had said:

"Did you notice any pattern to the values of g(i)? The values are 1, 4, 9, 16, 25 - which are the squares of the integers 1 to 5. But the formula I entered for g was not x * x, rather, it was x * x + 2 * x + 1. Then why are squares shown in the output? Reply in the comments if you get it, otherwise I will answer next time."

No reader commented with a solution. So here is a hint to figure it out:

What is the expansion of (a + b) ** 2 (a plus b the whole squared) in algebra?

Heh.

The drawing of the magnifying glass at the top of the post is by:

Yours truly.

( The same one that I used in this post:
Command line D utility - find files matching a pattern under a directory )

I'll leave you with another question: What, if any, could be the advantage of using Python generators in programs like these?
Notice that I said "programs like these", not "these programs".

Enjoy.

- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Sell your digital products via DPD: Digital Publishing for Ebooks and Downloads.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Get a fast web site with A2 Hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:


Sunday, September 23, 2018

How many ways can you substring a string? Part 2


By Vasudev Ram


Twine image attribution

Hi readers,

In my last post, i.e.:

How many ways can you substring a string? Part 1, I had said that there can be other ways of doing it, and that some enhancements were possible. This post (Part 2) is about that.

Here is another algorithm to find all substrings of a given string:

Let s be the input string.
Let n be the length of s.
Find and yield all substrings of s of length 1.
Find and yield all substrings of s of length 2.
...
Find and yield all substrings of s of length n.

Even without doing any formal analysis of the algorithm, we can intuitively see that it is correct, because it accounts for all possible cases (except for the empty string, but adding that is trivial).

[ BTW, what about uses for this sort of program? Although I just wrote it for fun, one possible use could be in word games like Scrabble. ]

The code for this new algorithm is in program all_substrings2.py below.
"""
all_substrings2.py
Function and program to find all substrings of a given string.
Author: Vasudev Ram
Copyright 2018 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Twitter: https://mobile.twitter.com/vasudevram
Product store: https://gumroad.com/vasudevram
"""

from __future__ import print_function
import sys
from error_exit import error_exit
from debug1 import debug1

def usage():
    message_lines = [\
        "Usage: python {} a_string".format(sa[0]),
        "Print all substrings of a_string.",
        "",
    ]
    sys.stderr.write("\n".join(message_lines))

def all_substrings2(s):
    """
    Generator function that yields all the substrings 
    of a given string s.
    Algorithm used:
    1. len_s = len(s)
    2. if len_s == 0, return ""
    3. (else len_s is > 0):
       for substr_len in 1 to len_s:
           find all substrings of s that are of length substr_len
           yield each such substring 
    Expected output for some strings:
    For "a":
        "a"
    For "ab":
        "a"
        "b"
        "ab"
    For "abc:
        "a"
        "b"
        "c"
        "ab"
        "bc"
        "abc"
    For "abcd:
        "a"
        "b"
        "c"
        "d"
        "ab"
        "bc"
        "cd"
        "abc"
        "bcd"
        "abcd"
    """

    len_s = len(s)
    substr_len = 1
    while substr_len <= len_s:
        start = 0
        end = start + substr_len
        while end <= len_s:
            debug1("s[{}:{}] = {}".format(\
                start, end, s[start:end]))
            yield s[start:end]
            start += 1
            end = start + substr_len
        substr_len += 1

def main():
    if lsa != 2:
        usage()
        error_exit("\nError: Exactly one argument must be given.\n")

    if sa[1] == "":
        print("")
        sys.exit(0)

    for substring in all_substrings2(sa[1]):
        print(substring)

sa = sys.argv
lsa = len(sa)

if __name__ == "__main__":
    main()
BTW, I added the empty string as the last item in the message_lines list (in the usage() function), as a small trick, to avoid having to explicitly add an extra newline after the joined string in the write() call.

Here are some runs of the program, with outputs, using Python 2.7 on Linux:

(pyo, in the commands below, is a shell alias I created for 'python -O', to disable debugging output. And a*2*y expands to all_substrings2.py, since there are no other filenames matching that wildcard pattern in my current directory. It's a common Unix shortcut to save typing. In fact, the bash shell expands that shortcut to the full filename when you type the pattern and then press Tab. But the expansion happens without pressing Tab too, if you just type that command and hit Enter. But you have to know for sure, up front, that the wildcard expands to only one filename (if you want that), or you can get wrong results, e.g. if such a wildcard expands to 3 filenames, and your program expects command-line arguments, the 2nd and 3rd filenames will be treated as command-line arguments for the program represented by the 1st filename. This will likely not be what you want, and may create problems.)

Run it without any arguments:
$ pyo a*2*y
Usage: python all_substrings2.py a_string
Print all substrings of a_string.

Error: Exactly one argument must be given.
Run a few times with some input strings of incrementally longer lengths:
$ pyo a*2*y a
a
$ pyo a*2*y ab
a
b
ab
$ pyo a*2*y abc
a
b
c
ab
bc
abc
$ pyo a*2*y abcd
a
b
c
d
ab
bc
cd
abc
bcd
abcd
Count the number of substrings in the above run for string abcd:
$ pyo a*2*y abcd | wc -l
10
$ pyo a*2*y abcde
a
b
c
d
e
ab
bc
cd
de
abc
bcd
cde
abcd
bcde
abcde
Count the number of substrings in the above run for string abcde:
$ pyo a*2*y abcde | wc -l
15
$ pyo a*2*y abcdef
a
b
c
d
e
f
ab
bc
cd
de
ef
abc
bcd
cde
def
abcd
bcde
cdef
abcde
bcdef
abcdef
Count the number of substrings in the above run for string abcdef:
$ pyo a*2*y abcdef | wc
     21      21      77
Now a few more with only the count:
$ pyo a*2*y abcdefg | wc
     28      28     112
$ pyo a*2*y abcdefgh | wc
     36      36     156
$ pyo a*2*y abcdefghi | wc
     45      45     210
Notice a pattern?

The count of substrings for each succeeding run (which has one more character in the input string than the preceding run has), is equal to the sum of the count for the preceding run and the length of the input string for the succeeding run; e.g. 10 + 5 = 15, 15 + 6 = 21, 21 + 7 = 28, etc. This is the same as the sum of the first n natural numbers.

There is a well-known formula for that sum: n * (n + 1) / 2.

There is a story (maybe apocryphal) that the famous mathematician Gauss was posed this problem - to find the sum of the numbers from 1 to 100 - by his teacher, after he misbehaved in class. To the surprise of the teacher, he gave the answer in seconds. From the Wikipedia article about Gauss:

[ Gauss's presumed method was to realize that pairwise addition of terms from opposite ends of the list yielded identical intermediate sums: 1 + 100 = 101, 2 + 99 = 101, 3 + 98 = 101, and so on, for a total sum of 50 × 101 = 5050. ]

From this we can see that the sum of this sequence satisfies the formula n * (n + 1) / 2, where n = 100, i.e. 100 * (100 + 1) / 2 = 50 * 101.

(Wikipedia says that Gauss "is ranked among history's most influential mathematicians".)

We can also run the all_substrings2.py program multiple times with different inputs, using a for loop in the shell:
$ for s in a ab abc abcd
> do
>   echo All substrings of $s:
>   pyo al*2*py $s
> done

All substrings of a:
a
All substrings of ab:
a
b
ab
All substrings of abc:
a
b
c
ab
bc
abc
All substrings of abcd:
a
b
c
d
ab
bc
cd
abc
bcd
abcd
Some remarks on the program versions shown (i.e. all_substrings.py and all_substrings2.py, in Parts 1 and 2 respectively):

Both versions use a generator function, to lazily yield each substring on demand. Either version can easily be changed to use a list instead of a generator (and the basic algorithm used will not need to change, in either case.) To do that, we have to delete the yield statement, collect all the generated substrings in a new list, and at the end, return that list to the caller. The caller's code will not need to change, although we will now be iterating over the list returned from the function, not over the values yielded by the generator. Some of the pros and cons of the two approaches (generator vs. list) are:

- the list approach has to create and store all the substrings first, before it can return them. So it uses memory proportional to the sum of the sizes of all the substrings generated, with some overhead due to Python's dynamic nature (but that per-string overhead exists for the generator approach too). (See this post: Exploring sizes of data types in Python.) The list approach will need a bit of overhead for the list too. But the generator approach needs to handle only one substring at a time, before yielding it to the caller, and does not use a list. So it will potentially use much less memory, particularly for larger input strings. The generator approach may even be faster than the list version, since repeated memory (re)allocation for the list (as it expands) has some overhead. But that is speculation on my part as of now. To be sure of it, one would have to do some analysis and/or some speed measurements of relevant test programs.

- the list approach gives you the complete list of substrings (after the function that generates them returns). So, in the caller, if you want to do multiple processing passes over them, you can. But the generator approach gives you each substring immediately as it is generated, you have to process it, and then it is gone. So you can only do one processing pass over the substrings generated. In other words, the generator's output is sequential-access, forward-only, one-item-at-a-time-only, and single-pass-only. (Unless you store all the yielded substrings, but then that becomes the same as the list approach.)

Another enhancement that can be useful is to output only the unique substrings. As I showed in Part 1, if there are any repeated characters in the input string, there can be duplicate substrings in the output. There are two obvious ways of getting only unique substrings:

1) By doing it internal to the program, using a Python dict. All we have to do is add each substring (as a key, with the corresponding value being anything, say None), to a dict, as and when the substring is generated. Then the substrings in the dict are guaranteed to be unique. Then at the end, we just print the substrings from the dict instead of from the list. If we want to print the substrings in the same order they were generated, we can use an OrderedDict.

See: Python 2 OrderedDict
and: Python 3 OrderedDict

(Note: In Python 3.7, OrderedDict may no longer be needed, because dicts are defined as keeping insertion order.)

2) By piping the output of the program (which is all the generated substrings, one per line) to the Unix uniq command, whose purpose is to select only unique items from its input. But for that, we have to sort the list first, since uniq requires sorted input to work properly. We can do that with pipelines like the following:

First, without sort and uniq; there are duplicates:

$ pyo all_substrings2.py aabbb | nl -ba
1 a
2 a
3 b
4 b
5 b
6 aa
7 ab
8 bb
9 bb
10 aab
11 abb
12 bbb
13 aabb
14 abbb
15 aabbb

Then with sort and uniq; now there are no duplicates:

$ pyo all_substrings2.py aabbb | sort | uniq | nl -ba
1 a
2 aa
3 aab
4 aabb
5 aabbb
6 ab
7 abb
8 abbb
9 b
10 bb
11 bbb

The man pages for sort and uniq are here:

sort
uniq

That's it for now. I have a few more points which I may want to add; if I decide to do so, I'll do them in a Part 3 post.

The image at the top of the post is of spools of twine (a kind of string) from Wikipedia.

- Enjoy.


- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix/Linux (commands and shell scripting) and SQL programming and database design, with personal coaching sessions.

Contact me for details of course content, terms and schedule.

DPD: Digital Publishing for Ebooks and Downloads.

Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix.

Check out WP Engine, powerful WordPress hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Track Conversions and Monitor Click Fraud with Improvely.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:


Wednesday, September 12, 2018

How many ways can you substring a string? Part 1


By Vasudev Ram




String image attribution

Recently, something I read made me think of writing a simple program to generate all substrings of a given string.
(To be precise, excluding the null string.)

Here is an initial version I came up with, all_substrings.py:
"""
all_substrings.py
Function and program to find all the substrings of a given string.
Author: Vasudev Ram
Copyright 2018 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Twitter: https://mobile.twitter.com/vasudevram
Product Store: https://gumroad.com/vasudevram
"""

from __future__ import print_function
import sys
from error_exit import error_exit
from debug1 import debug1

def usage():
    message_lines = [\
        "Usage: python {} a_string".format(sa[0]),
        "Print all substrings of a_string.",
    ]
    sys.stderr.write("\n".join(message_lines))

def all_substrings(s):
    """
    Generator function that yields all the substrings of a given string.
    """

    ls = len(s)
    if ls == 0:
        usage()
        error_exit("\nError: String argument must be non-empty.")

    start = 0
    while start < ls:
        end = start + 1
        while end <= ls:
            debug1("s[{}:{}] = {}".format(start, end, s[start:end]))
            yield s[start:end]
            end += 1
        start += 1

def main():
    if lsa != 2:
        usage()
        error_exit("\nError: Exactly one argument must be given.")

    for substring in all_substrings(sa[1]):
        print(substring)

sa = sys.argv
lsa = len(sa)

if __name__ == "__main__":
    main()
Some runs and output of the program:

With no command-line arguments:
$ python all_substrings.py
Usage: python all_substrings.py a_string
Print all substrings of a_string.
Error: Exactly one argument must be given.
With one command-line argument, an empty string:
$ python all_substrings.py ""
Usage: python all_substrings.py a_string
Print all substrings of a_string.
Error: String argument must be non-empty.
Now with a 3-character string, with debugging enabled, via the use of my debug1 debugging function [1] (and Python's __debug__ built-in variable, which is set to True by default):
$ python all_substrings.py abc
s[0:1] = a
a
s[0:2] = ab
ab
s[0:3] = abc
abc
s[1:2] = b
b
s[1:3] = bc
bc
s[2:3] = c
c
[1] You can read about and get the code for that debugging function here:

Improved simple Python debugging function

The remaining runs are with debugging turned off via Python's -O flag:

With a 4-character string:
$ python -O all_substrings.py abcd
a
ab
abc
abcd
b
bc
bcd
c
cd
d
With a 4-character string, not all characters unique:
$ python -O all_substrings.py FEED
F
FE
FEE
FEED
E
EE
EED
E
ED
D
Note that when there are duplicated characters in the input, we can get duplicate substrings in the output; in this case, E appears twice.

With a string of length 6, again with some characters repeated (E and D):
$ python -O all_substrings.py FEEDED
F
FE
FEE
FEED
FEEDE
FEEDED
E
EE
EED
EEDE
EEDED
E
ED
EDE
EDED
D
DE
DED
E
ED
D
Again, we get duplicate substrings in the output.

With a 6-character string, no duplicate characters:
$ python -O all_substrings.py 123456
1
12
123
1234
12345
123456
2
23
234
2345
23456
3
34
345
3456
4
45
456
5
56
6
Is there any other way of doing it?
Any interesting enhancements possible?

Yes to both questions.
I will cover some of those points in a follow-up post.

Actually, I already did one thing in the current version, which is of interest: I used a generator to yield the substrings lazily, instead of creating them all upfront, and then returning them all in a list. I'll show and discuss a few pros and cons of some other approaches later.

Meanwhile, want to have a bit of fun with visual effects?

Try some variations of runs of the program like these:


python -O all_substrings.py +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

python -O all_substrings.py /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

python -O all_substrings.py "%% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $"

$ python -O all_substrings.py 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010

python -O all_substrings.py ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"

You can change the characters used in the string argument to any combination of any punctuation characters, or even letters or digits - anything you like. You can also vary the number of characters used in the string. Longer ones (though not too long) tend to give better visual effects and the display also lasts for longer. Note that all the characters of the string you use, should come on the same single line, the same line as the python command you use. Also, if using a pipe character (|) (or any other characters that are special to your OS shell), enclose the whole string in quotes as I have done in an example above. I ran this on Windows and so used double quotes for such cases. Single quotes give errors. On Unix-like systems, either may work, but some characters may get interpreted inside double quotes. Experiment :)

You can also add an import time statement in the imports section of the program, and then use a time.sleep(number) inside the for loop, say, just above the print(substring) statement. I used values like:
time.sleep(0.002)
which works well for my display. You can tweak that number for your hardware.

- Have fun.

Did you know that there are a large number of meanings and contexts for the word string? Here are some of them:

String (Wikipedia).

This Wikipedia article about strings in computer science is interesting, and has a lot more points than one might imagine at first:

(computer) strings


- Vasudev Ram - Online Python training and consulting

Hit the ground running with my vi quickstart tutorial, vetted by two Windows system administrator friends.

Jump to posts: Python * DLang * xtopdf

Interested in a Python, SQL or Linux course?

Get WP Engine, powerful managed WordPress hosting.

Subscribe to my blog (jugad2.blogspot.com) by email

My ActiveState Code recipes


Follow me on:

Gumroad * LinkedIn * Twitter

Do you create online products? Get Convertkit:

Email marketing for digital product creators


Thursday, April 27, 2017

Using nested conditional expressions to classify characters

By Vasudev Ram


While writing some Python code, I happened to use a conditional expression, a Python language feature.

Conditional expressions are expressions (not statements) that have if/else clauses inside them, and they evaluate to either one of two values (in the basic case), depending on the value of a boolean condition. For example:
for n in range(4):
    print n, 'is odd' if n % 2 == 1 else 'is even'
0 is even
1 is odd
2 is even
3 is odd
Here, the conditional expression is this part of the print statement above:
'is odd' if n % 2 == 1 else 'is even'
This expression evaluates to 'is odd' if the condition after the if is True, and evaluates to 'is even' otherwise. So it evaluates to a string in either case, and that string gets printed (after the value of n).

Excerpt from the section about conditional expressions in the Python Language Reference:

[
conditional_expression ::= or_test ["if" or_test "else" expression]
expression ::= conditional_expression | lambda_expr

Conditional expressions (sometimes called a “ternary operator”) have the lowest priority of all Python operations.

The expression x if C else y first evaluates the condition, C (not x); if C is true, x is evaluated and its value is returned; otherwise, y is evaluated and its value is returned.
]

You can see that the definition of conditional_expression is recursive, since it is partly defined in terms of itself (via the definition of expression).

This implies that you can have recursive or nested conditional expressions.

Also, since the syntax of the Python return statement is:
return [ expression_list ]
(where expression_list means one or more expressions, separated by commas, it follows that we can use a nested conditional expression in a return statement (because a nested conditional expresssion is an expression).

Here is a small program to demonstrate that:
'''
File: return_with_nested_cond_exprs.py 
Purpose: Demonstrate nested conditional expressions used in a return statement, 
to classify letters in a string as lowercase, uppercase or neither.
Also demonstrates doing the same task without a function and a return, 
using a lambda and map instead.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
'''

from __future__ import print_function
from string import lowercase, uppercase

# Use return with nested conditional expressions inside a function, 
# to classify characters in a string as lowercase, uppercase or neither:
def classify_char(ch):
    return ch + ': ' + ('lowercase' if ch in lowercase else \
    'uppercase' if ch in uppercase else 'neither')

print("Classify using a function:")
for ch in 'AaBbCc12+-':
    print(classify_char(ch))

print()

# Do it using map and lambda instead of def and for:
print("Classify using map and lambda:")

print('\n'.join(map(lambda ch: ch + ': ' + ('lowercase' if ch in lowercase else 
'uppercase' if ch in uppercase else 'neither'), 'AaBbCc12+-')))
Running it with:
$ python return_with_nested_cond_exprs.py
gives this output:
Classify using a function:
A: uppercase
a: lowercase
B: uppercase
b: lowercase
C: uppercase
c: lowercase
1: neither
2: neither
+: neither
-: neither

Classify using map and lambda:
A: uppercase
a: lowercase
B: uppercase
b: lowercase
C: uppercase
c: lowercase
1: neither
2: neither
+: neither
-: neither
As you can see from the code and the output, I also used that same nested conditional expression in a lambda function, along with map, to do the same task in a more functional style
.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Friday, April 21, 2017

Python callbacks using classes and methods

By Vasudev Ram

Hi readers,

I had written this post a few days ago:

Implementing and using callbacks in Python

In it, I had shown how to create simple callbacks using just plain Python functions, and said I would write a bit more about callbacks in my next post.

This is that next post. It discusses how to create callbacks in Python using classes and methods.

Here is an example program, callback_demo2.py, that shows how to do that:
'''
File: callback_demo2.py
To demonstrate implementation and use of callbacks in Python, 
using classes with methods as the callbacks.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

from __future__ import print_function
import sys
from time import sleep

class FileReporter(object):

    def __init__(self, filename):
        self._filename = filename
        try:
            self._fil = open(self._filename, "w")
        except IOError as ioe:
            sys.stderr.write("While opening {}, caught IOError: {}\n".format(
                self._filename, repr(ioe)))
            sys.exit(1)

    def report(self, message):
        self._fil.write(message)

class ScreenReporter(object):

    def __init__(self, dest):
        self._dest = dest

    def report(self, message):
        self._dest.write(message)

def square(i):
    return i * i

def cube(i):
    return i * i * i

def processor(process, times, report_interval, reporter):
    result = 0
    for i in range(1, times + 1):
        result += process(i)
        sleep(0.1)
        if i % report_interval == 0:
            # This is the call to the callback method 
            # that was passed to this function.
            reporter.report("Items processed: {}. Running result: {}.\n".format(i, result))

file_reporter = FileReporter("processor_report.txt")
processor(square, 20, 5, file_reporter)

stdout_reporter = ScreenReporter(sys.stdout)
processor(square, 20, 5, stdout_reporter)

stderr_reporter = ScreenReporter(sys.stderr)
processor(square, 20, 5, stderr_reporter)
I ran it with:
$ python callback_demo2.py >out 2>err
The above command creates 3 files, processor_report.txt, out and err.
Running fc /l on those 3 files, pairwise, shows that all three have the same content, but the output has gone to 3 different destinations (a specified file, standard output, and standard error output, based on which callback was passed to the processor function, in the 3 calls to it.

These two lines from the program:
file_reporter = FileReporter("processor_report.txt")
processor(square, 20, 5, file_reporter)
send output to the file processor_report.txt.

These two lines:
stdout_reporter = ScreenReporter(sys.stdout)
processor(square, 20, 5, stdout_reporter)
send output to the standard output (stdout), which is redirected to the file out.

These two lines:
stderr_reporter = ScreenReporter(sys.stderr)
processor(square, 20, 5, stderr_reporter)
send output to the standard error output (stderr), which is redirected to the file err.

The difference between this program (callback_demo2.py) and the one in the previous post (callback_demo.py), is that in this one, I pass an instance of some class to the processor function, as its last argument. This argument is the callback. And this time, rather than treating it as a function, processor treats it as an object, and invokes the report method on it, giving us much the same output as before (I just made minor cosmetic changes in the output). This same thing is done in all the 3 calls to processor. The difference is that different types of objects are passed each time, for the callback argument.

[ An interesting side note here is that in some other languages, for example, Java, at least in earlier Java versions (I'm not sure about recent ones), we cannot just pass different types of objects for the same callback parameter, unless they are all derived from some common base class (via inheritance). (Can it be done using Java interfaces? Need to check - been a while.) While that is also possible in Python, it is not necessary, as we see here, due to Python's more dynamic nature. Just passing any object that implements a report method is enough, as the program shows. The FileReporter and ScreenReporter classes do not (need to) descend from some common base class (other than object, but that is just the syntax for new-style classes; the object class not provide the report method). Python's smooth polymorphism, a.k.a. duck typing, takes care of it. ]

The first time, a FileReporter object is passed, so the output goes to a text file.

The second and third times, a ScreenReporter object is passed, initializing the _dest field to stdout and stderr respectively, so the output goes to those two destinations respectively. And the command line redirects those outputs to the files out and err.

Although I didn't say so in the previous post, the first argument to processor (square), is also a callback, since it, in turn, is called from processor, via the process parameter. So I can pass some other argument in place of it, like the cube function defined in the program, to get different computations done and hence different results.




Keen on creating and selling products online? Check out the free Product Creation Masterclass. It runs for about 4 weeks from 24 April 2017 (that's 3 days from now), with lots of emails, videos and expert interviews, all geared toward helping you create and sell your first product online. Check it out: The Product Creation Masterclass.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Tuesday, April 18, 2017

Implementing and using callbacks in Python

By Vasudev Ram

Callbacks (or callback functions, as they are also called) are a useful and somewhat powerful programming technique. I first learned about the technique (in C) during a course on X-Windows , XLib and Motif, that I attended years ago.

Wikipedia article about callbacks.

( Also see this :)

In this post I'll show a simple way of implementing callbacks in Python. Due to Python's dynamic nature, it is quite easy to implement callbacks in it, more so than in some other languages, like C++ or Java. (But it's not difficult in those languages either.)

The program callback_demo.py (below) shows a simple way to implement and use callbacks, using just plain functions, i.e. no classes or anything more sophisticated are needed:
# File: callback_demo.py
# To demonstrate implementation and use of callbacks in Python, 
# using just plain functions.
# Author: Vasudev Ram
# Copyright 2017 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
from time import sleep

def callback_a(i, result):
    print("Items processed: {}. Running result: {}.".format(i, result))

def square(i):
    return i * i

def processor(process, times, report_interval, callback):
    print("Entered processor(): times = {}, report_interval = {}, callback = {}".format(
    times, report_interval, callback.func_name))
    # Can also use callback.__name__ instead of callback.func_name in line above.
    result = 0
    print("Processing data ...")
    for i in range(1, times + 1):
        result += process(i)
        sleep(1)
        if i % report_interval == 0:
            # This is the call to the callback function 
            # that was passed to this function.
            callback(i, result)

processor(square, 20, 5, callback_a)
And here is the output when I run it:
$ python callback_demo.py
Entered processor(): times = 20, report_interval = 5, callback = callback_a
Processing data ...
Items processed: 5. Running result: 55.
Items processed: 10. Running result: 385.
Items processed: 15. Running result: 1240.
Items processed: 20. Running result: 2870.
$

The function callback_a (the last argument to the call to processor), gets substituted for the function parameter callback in the processor function definition. So callback_a gets called, and it reports the progress of the work being done. If we passed a different callback function instead of callback_a, we could get different behavior, e.g. progress report in a different format, or to a different destination, or something else, without changing the actual processor function. So this is a way of creating functions with low coupling, or putting it in other terms, creating functions (like processor) that have high cohesion.

Read up about coupling and cohesion.

Note that in the program, callback_demo.py, I've shown two ways of getting the name of a function, one being callback.func_name and the other being callback.__name__ (the latter in a comment). Both ways work. Also see this code snippet, which shows that if you define a function foo, it's func_name attribute is 'foo'; if you then assign foo to bar, bar's func_name attribute is still 'foo', not 'bar':
>>> def foo(): pass
...
>>> foo.__name__
'foo'
>>> bar = foo
>>>
>>> bar.__name__
'foo'
>>>
>>> foo.func_name
'foo'
>>> bar.func_name
'foo'
I'll talk a bit more about callbacks in my next post.

Enjoy.

- Vasudev Ram - Online Python training and consulting

Do you have the yen to create products? Check out the Product Creation Masterclass. (Not yen as in cash - the class is free.) It runs for about 4 weeks from 24 April 2017, with lots of emails and videos and interviews, all geared toward helping you create and sell your first product online. Check it out: The Product Creation Masterclass.

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Sunday, March 12, 2017

Find the number of bits needed to store an integer, and its binary representation (Python)

By Vasudev Ram


Hi readers,

I wrote this post yesterday:

Analyse this Python code snippet

I was initially going to give the solution (to the question asked in that post) today, but then realized that I posted it at the weekend. So, to give a bit of time for anyone to attempt it, including some of my programming students, I decided to post the solution early next week.

But in the meantime, inspired some Q&A in a class I taught, I had the idea of creating this simple program to find the the number of bits needed to represent integers of various sizes (0 to 256, specifically, though the code can easily be modified to do it for any size of int). Note that this is the calculation of the minimum number of bits needed to represent some integers per se, not necessarily the number of bits that Python or any other language actually uses to store those same integers, which can be more than the minimum. This is because, at least in the case of Python, being a dynamic language, most data types have more capabilities than just being data - e.g. ints in Python are objects, so they incur some overhead for being objects (instances of classes, such as having a dictionary of attributes, and so on). The other reason is that data objects in dynamic languages often take up extra pre-allocated space, to store some metadata or to allow for future expansion in the size of the value being stored - e.g. that may or not apply in the case of ints, but it can in the case of lists.

(See this earlier post by me: Exploring sizes of data types in Python for more on this topic.)

Note: the level of this post is targeted towards relative beginners to programming, who might not be too familiar with computer representation of numbers, or even towards somewhat experienced programmers, who are not familiar with that topic either (I've come across some). Believe it or not, I've come across people (ex-colleagues in some cases, as well as others, and developers as well as system administrators) who did not know that a compiled binary for one processor will usually not run on another type of processor [1], even though they may be running the same OS (such as Unix), because their (processor) instructions sets are different. (What's an instruction set? - some of them might ask.) This is like expecting a person who only knows language A to understand something spoken in language B - impossible, at least without some source of help, be it another person or a dictionary.

Having some knowledge in these areas (i.e. system internals or under-the-hood stuff, even at a basic level) is quite useful to have, and almost needed, for many real-life situations, ranging from things like choosing an appropriate data representation for your data, finding and fixing bugs quicker, system integration, data reading / writing / transformation / conversion, to bigger-picture issues like system performance and portability.

[1] Though there are exceptions to that these days, such as fat binaries.

Anyway, end of rant :)

I chose 256 as the upper limit because it is the size (+1) of the highest unsigned integer that can be stored in a single byte, and because values in the range 0 to 255 or 256 are very commonly used in low-level code, such as bit manipulation, assembly or machine language, C, some kinds of data processing (e.g. of many binary file formats), and so on. Of course values of word size (2 bytes / 16 bits) or double-word (4 bytes / 32 bits) are also often used in those areas, but the program can be modified to handle them too.

If you want to get a preview (and a clue) about what is coming up, check this snippet and its output first:
>>> for item in (0, 1, 2, 4, 8, 16):
...     print item.bit_length()
...
0
1
2
3
4
5
Hint: Notice that those are all powers of 2 in the tuple above, and correlate that fact with the output values.

Here is the program to find the number of bits needed to store an integer, and its binary representation (Python):
# int_bit_length_and_binary_repr.py
# Purpose: For integers from 0 to 256, print the number of 
# bits needed to represent them, and their values in binary.
# Author: Vasudev Ram
# Website: https://vasudevram.github.io
# Product store on Gumroad: https://gumroad.com/vasudevram
# Blog: https://jugad2.blogspot.com
# Twitter: @vasudevram

for an_int in range(0, 256 + 1):
    print an_int, "takes", an_int.bit_length(), "bits to represent,",
    print "and equals", bin(an_int), "in binary"

Before showing the output (because it is long, since I've shown all 257 rows of it:

If you found this post informative, you may also be interested in this earlier one on a related topic:

Converting numeric strings to integers with handrolled code

(I didn't remember to say it in that earlier post, but the image at the top of it is of a roti being rolled out with a rolling pin:)

And here is the output when I run the program:
$ python int_bit_length_and_binary_repr.py
0 takes 0 bits to represent, and equals 0b0 in binary
1 takes 1 bits to represent, and equals 0b1 in binary
2 takes 2 bits to represent, and equals 0b10 in binary
3 takes 2 bits to represent, and equals 0b11 in binary
4 takes 3 bits to represent, and equals 0b100 in binary
5 takes 3 bits to represent, and equals 0b101 in binary
6 takes 3 bits to represent, and equals 0b110 in binary
7 takes 3 bits to represent, and equals 0b111 in binary
8 takes 4 bits to represent, and equals 0b1000 in binary
9 takes 4 bits to represent, and equals 0b1001 in binary
10 takes 4 bits to represent, and equals 0b1010 in binary
11 takes 4 bits to represent, and equals 0b1011 in binary
12 takes 4 bits to represent, and equals 0b1100 in binary
13 takes 4 bits to represent, and equals 0b1101 in binary
14 takes 4 bits to represent, and equals 0b1110 in binary
15 takes 4 bits to represent, and equals 0b1111 in binary
16 takes 5 bits to represent, and equals 0b10000 in binary
17 takes 5 bits to represent, and equals 0b10001 in binary
18 takes 5 bits to represent, and equals 0b10010 in binary
19 takes 5 bits to represent, and equals 0b10011 in binary
20 takes 5 bits to represent, and equals 0b10100 in binary
21 takes 5 bits to represent, and equals 0b10101 in binary
22 takes 5 bits to represent, and equals 0b10110 in binary
23 takes 5 bits to represent, and equals 0b10111 in binary
24 takes 5 bits to represent, and equals 0b11000 in binary
25 takes 5 bits to represent, and equals 0b11001 in binary
26 takes 5 bits to represent, and equals 0b11010 in binary
27 takes 5 bits to represent, and equals 0b11011 in binary
28 takes 5 bits to represent, and equals 0b11100 in binary
29 takes 5 bits to represent, and equals 0b11101 in binary
30 takes 5 bits to represent, and equals 0b11110 in binary
31 takes 5 bits to represent, and equals 0b11111 in binary
32 takes 6 bits to represent, and equals 0b100000 in binary
33 takes 6 bits to represent, and equals 0b100001 in binary
34 takes 6 bits to represent, and equals 0b100010 in binary
35 takes 6 bits to represent, and equals 0b100011 in binary
36 takes 6 bits to represent, and equals 0b100100 in binary
37 takes 6 bits to represent, and equals 0b100101 in binary
38 takes 6 bits to represent, and equals 0b100110 in binary
39 takes 6 bits to represent, and equals 0b100111 in binary
40 takes 6 bits to represent, and equals 0b101000 in binary
41 takes 6 bits to represent, and equals 0b101001 in binary
42 takes 6 bits to represent, and equals 0b101010 in binary
43 takes 6 bits to represent, and equals 0b101011 in binary
44 takes 6 bits to represent, and equals 0b101100 in binary
45 takes 6 bits to represent, and equals 0b101101 in binary
46 takes 6 bits to represent, and equals 0b101110 in binary
47 takes 6 bits to represent, and equals 0b101111 in binary
48 takes 6 bits to represent, and equals 0b110000 in binary
49 takes 6 bits to represent, and equals 0b110001 in binary
50 takes 6 bits to represent, and equals 0b110010 in binary
51 takes 6 bits to represent, and equals 0b110011 in binary
52 takes 6 bits to represent, and equals 0b110100 in binary
53 takes 6 bits to represent, and equals 0b110101 in binary
54 takes 6 bits to represent, and equals 0b110110 in binary
55 takes 6 bits to represent, and equals 0b110111 in binary
56 takes 6 bits to represent, and equals 0b111000 in binary
57 takes 6 bits to represent, and equals 0b111001 in binary
58 takes 6 bits to represent, and equals 0b111010 in binary
59 takes 6 bits to represent, and equals 0b111011 in binary
60 takes 6 bits to represent, and equals 0b111100 in binary
61 takes 6 bits to represent, and equals 0b111101 in binary
62 takes 6 bits to represent, and equals 0b111110 in binary
63 takes 6 bits to represent, and equals 0b111111 in binary
64 takes 7 bits to represent, and equals 0b1000000 in binary
65 takes 7 bits to represent, and equals 0b1000001 in binary
66 takes 7 bits to represent, and equals 0b1000010 in binary
67 takes 7 bits to represent, and equals 0b1000011 in binary
68 takes 7 bits to represent, and equals 0b1000100 in binary
69 takes 7 bits to represent, and equals 0b1000101 in binary
70 takes 7 bits to represent, and equals 0b1000110 in binary
71 takes 7 bits to represent, and equals 0b1000111 in binary
72 takes 7 bits to represent, and equals 0b1001000 in binary
73 takes 7 bits to represent, and equals 0b1001001 in binary
74 takes 7 bits to represent, and equals 0b1001010 in binary
75 takes 7 bits to represent, and equals 0b1001011 in binary
76 takes 7 bits to represent, and equals 0b1001100 in binary
77 takes 7 bits to represent, and equals 0b1001101 in binary
78 takes 7 bits to represent, and equals 0b1001110 in binary
79 takes 7 bits to represent, and equals 0b1001111 in binary
80 takes 7 bits to represent, and equals 0b1010000 in binary
81 takes 7 bits to represent, and equals 0b1010001 in binary
82 takes 7 bits to represent, and equals 0b1010010 in binary
83 takes 7 bits to represent, and equals 0b1010011 in binary
84 takes 7 bits to represent, and equals 0b1010100 in binary
85 takes 7 bits to represent, and equals 0b1010101 in binary
86 takes 7 bits to represent, and equals 0b1010110 in binary
87 takes 7 bits to represent, and equals 0b1010111 in binary
88 takes 7 bits to represent, and equals 0b1011000 in binary
89 takes 7 bits to represent, and equals 0b1011001 in binary
90 takes 7 bits to represent, and equals 0b1011010 in binary
91 takes 7 bits to represent, and equals 0b1011011 in binary
92 takes 7 bits to represent, and equals 0b1011100 in binary
93 takes 7 bits to represent, and equals 0b1011101 in binary
94 takes 7 bits to represent, and equals 0b1011110 in binary
95 takes 7 bits to represent, and equals 0b1011111 in binary
96 takes 7 bits to represent, and equals 0b1100000 in binary
97 takes 7 bits to represent, and equals 0b1100001 in binary
98 takes 7 bits to represent, and equals 0b1100010 in binary
99 takes 7 bits to represent, and equals 0b1100011 in binary
100 takes 7 bits to represent, and equals 0b1100100 in binary
101 takes 7 bits to represent, and equals 0b1100101 in binary
102 takes 7 bits to represent, and equals 0b1100110 in binary
103 takes 7 bits to represent, and equals 0b1100111 in binary
104 takes 7 bits to represent, and equals 0b1101000 in binary
105 takes 7 bits to represent, and equals 0b1101001 in binary
106 takes 7 bits to represent, and equals 0b1101010 in binary
107 takes 7 bits to represent, and equals 0b1101011 in binary
108 takes 7 bits to represent, and equals 0b1101100 in binary
109 takes 7 bits to represent, and equals 0b1101101 in binary
110 takes 7 bits to represent, and equals 0b1101110 in binary
111 takes 7 bits to represent, and equals 0b1101111 in binary
112 takes 7 bits to represent, and equals 0b1110000 in binary
113 takes 7 bits to represent, and equals 0b1110001 in binary
114 takes 7 bits to represent, and equals 0b1110010 in binary
115 takes 7 bits to represent, and equals 0b1110011 in binary
116 takes 7 bits to represent, and equals 0b1110100 in binary
117 takes 7 bits to represent, and equals 0b1110101 in binary
118 takes 7 bits to represent, and equals 0b1110110 in binary
119 takes 7 bits to represent, and equals 0b1110111 in binary
120 takes 7 bits to represent, and equals 0b1111000 in binary
121 takes 7 bits to represent, and equals 0b1111001 in binary
122 takes 7 bits to represent, and equals 0b1111010 in binary
123 takes 7 bits to represent, and equals 0b1111011 in binary
124 takes 7 bits to represent, and equals 0b1111100 in binary
125 takes 7 bits to represent, and equals 0b1111101 in binary
126 takes 7 bits to represent, and equals 0b1111110 in binary
127 takes 7 bits to represent, and equals 0b1111111 in binary
128 takes 8 bits to represent, and equals 0b10000000 in binary
129 takes 8 bits to represent, and equals 0b10000001 in binary
130 takes 8 bits to represent, and equals 0b10000010 in binary
131 takes 8 bits to represent, and equals 0b10000011 in binary
132 takes 8 bits to represent, and equals 0b10000100 in binary
133 takes 8 bits to represent, and equals 0b10000101 in binary
134 takes 8 bits to represent, and equals 0b10000110 in binary
135 takes 8 bits to represent, and equals 0b10000111 in binary
136 takes 8 bits to represent, and equals 0b10001000 in binary
137 takes 8 bits to represent, and equals 0b10001001 in binary
138 takes 8 bits to represent, and equals 0b10001010 in binary
139 takes 8 bits to represent, and equals 0b10001011 in binary
140 takes 8 bits to represent, and equals 0b10001100 in binary
141 takes 8 bits to represent, and equals 0b10001101 in binary
142 takes 8 bits to represent, and equals 0b10001110 in binary
143 takes 8 bits to represent, and equals 0b10001111 in binary
144 takes 8 bits to represent, and equals 0b10010000 in binary
145 takes 8 bits to represent, and equals 0b10010001 in binary
146 takes 8 bits to represent, and equals 0b10010010 in binary
147 takes 8 bits to represent, and equals 0b10010011 in binary
148 takes 8 bits to represent, and equals 0b10010100 in binary
149 takes 8 bits to represent, and equals 0b10010101 in binary
150 takes 8 bits to represent, and equals 0b10010110 in binary
151 takes 8 bits to represent, and equals 0b10010111 in binary
152 takes 8 bits to represent, and equals 0b10011000 in binary
153 takes 8 bits to represent, and equals 0b10011001 in binary
154 takes 8 bits to represent, and equals 0b10011010 in binary
155 takes 8 bits to represent, and equals 0b10011011 in binary
156 takes 8 bits to represent, and equals 0b10011100 in binary
157 takes 8 bits to represent, and equals 0b10011101 in binary
158 takes 8 bits to represent, and equals 0b10011110 in binary
159 takes 8 bits to represent, and equals 0b10011111 in binary
160 takes 8 bits to represent, and equals 0b10100000 in binary
161 takes 8 bits to represent, and equals 0b10100001 in binary
162 takes 8 bits to represent, and equals 0b10100010 in binary
163 takes 8 bits to represent, and equals 0b10100011 in binary
164 takes 8 bits to represent, and equals 0b10100100 in binary
165 takes 8 bits to represent, and equals 0b10100101 in binary
166 takes 8 bits to represent, and equals 0b10100110 in binary
167 takes 8 bits to represent, and equals 0b10100111 in binary
168 takes 8 bits to represent, and equals 0b10101000 in binary
169 takes 8 bits to represent, and equals 0b10101001 in binary
170 takes 8 bits to represent, and equals 0b10101010 in binary
171 takes 8 bits to represent, and equals 0b10101011 in binary
172 takes 8 bits to represent, and equals 0b10101100 in binary
173 takes 8 bits to represent, and equals 0b10101101 in binary
174 takes 8 bits to represent, and equals 0b10101110 in binary
175 takes 8 bits to represent, and equals 0b10101111 in binary
176 takes 8 bits to represent, and equals 0b10110000 in binary
177 takes 8 bits to represent, and equals 0b10110001 in binary
178 takes 8 bits to represent, and equals 0b10110010 in binary
179 takes 8 bits to represent, and equals 0b10110011 in binary
180 takes 8 bits to represent, and equals 0b10110100 in binary
181 takes 8 bits to represent, and equals 0b10110101 in binary
182 takes 8 bits to represent, and equals 0b10110110 in binary
183 takes 8 bits to represent, and equals 0b10110111 in binary
184 takes 8 bits to represent, and equals 0b10111000 in binary
185 takes 8 bits to represent, and equals 0b10111001 in binary
186 takes 8 bits to represent, and equals 0b10111010 in binary
187 takes 8 bits to represent, and equals 0b10111011 in binary
188 takes 8 bits to represent, and equals 0b10111100 in binary
189 takes 8 bits to represent, and equals 0b10111101 in binary
190 takes 8 bits to represent, and equals 0b10111110 in binary
191 takes 8 bits to represent, and equals 0b10111111 in binary
192 takes 8 bits to represent, and equals 0b11000000 in binary
193 takes 8 bits to represent, and equals 0b11000001 in binary
194 takes 8 bits to represent, and equals 0b11000010 in binary
195 takes 8 bits to represent, and equals 0b11000011 in binary
196 takes 8 bits to represent, and equals 0b11000100 in binary
197 takes 8 bits to represent, and equals 0b11000101 in binary
198 takes 8 bits to represent, and equals 0b11000110 in binary
199 takes 8 bits to represent, and equals 0b11000111 in binary
200 takes 8 bits to represent, and equals 0b11001000 in binary
201 takes 8 bits to represent, and equals 0b11001001 in binary
202 takes 8 bits to represent, and equals 0b11001010 in binary
203 takes 8 bits to represent, and equals 0b11001011 in binary
204 takes 8 bits to represent, and equals 0b11001100 in binary
205 takes 8 bits to represent, and equals 0b11001101 in binary
206 takes 8 bits to represent, and equals 0b11001110 in binary
207 takes 8 bits to represent, and equals 0b11001111 in binary
208 takes 8 bits to represent, and equals 0b11010000 in binary
209 takes 8 bits to represent, and equals 0b11010001 in binary
210 takes 8 bits to represent, and equals 0b11010010 in binary
211 takes 8 bits to represent, and equals 0b11010011 in binary
212 takes 8 bits to represent, and equals 0b11010100 in binary
213 takes 8 bits to represent, and equals 0b11010101 in binary
214 takes 8 bits to represent, and equals 0b11010110 in binary
215 takes 8 bits to represent, and equals 0b11010111 in binary
216 takes 8 bits to represent, and equals 0b11011000 in binary
217 takes 8 bits to represent, and equals 0b11011001 in binary
218 takes 8 bits to represent, and equals 0b11011010 in binary
219 takes 8 bits to represent, and equals 0b11011011 in binary
220 takes 8 bits to represent, and equals 0b11011100 in binary
221 takes 8 bits to represent, and equals 0b11011101 in binary
222 takes 8 bits to represent, and equals 0b11011110 in binary
223 takes 8 bits to represent, and equals 0b11011111 in binary
224 takes 8 bits to represent, and equals 0b11100000 in binary
225 takes 8 bits to represent, and equals 0b11100001 in binary
226 takes 8 bits to represent, and equals 0b11100010 in binary
227 takes 8 bits to represent, and equals 0b11100011 in binary
228 takes 8 bits to represent, and equals 0b11100100 in binary
229 takes 8 bits to represent, and equals 0b11100101 in binary
230 takes 8 bits to represent, and equals 0b11100110 in binary
231 takes 8 bits to represent, and equals 0b11100111 in binary
232 takes 8 bits to represent, and equals 0b11101000 in binary
233 takes 8 bits to represent, and equals 0b11101001 in binary
234 takes 8 bits to represent, and equals 0b11101010 in binary
235 takes 8 bits to represent, and equals 0b11101011 in binary
236 takes 8 bits to represent, and equals 0b11101100 in binary
237 takes 8 bits to represent, and equals 0b11101101 in binary
238 takes 8 bits to represent, and equals 0b11101110 in binary
239 takes 8 bits to represent, and equals 0b11101111 in binary
240 takes 8 bits to represent, and equals 0b11110000 in binary
241 takes 8 bits to represent, and equals 0b11110001 in binary
242 takes 8 bits to represent, and equals 0b11110010 in binary
243 takes 8 bits to represent, and equals 0b11110011 in binary
244 takes 8 bits to represent, and equals 0b11110100 in binary
245 takes 8 bits to represent, and equals 0b11110101 in binary
246 takes 8 bits to represent, and equals 0b11110110 in binary
247 takes 8 bits to represent, and equals 0b11110111 in binary
248 takes 8 bits to represent, and equals 0b11111000 in binary
249 takes 8 bits to represent, and equals 0b11111001 in binary
250 takes 8 bits to represent, and equals 0b11111010 in binary
251 takes 8 bits to represent, and equals 0b11111011 in binary
252 takes 8 bits to represent, and equals 0b11111100 in binary
253 takes 8 bits to represent, and equals 0b11111101 in binary
254 takes 8 bits to represent, and equals 0b11111110 in binary
255 takes 8 bits to represent, and equals 0b11111111 in binary
256 takes 9 bits to represent, and equals 0b100000000 in binary
If you look carefully at the values in the output, you can notice some interesting bit patterns, e.g.:

1. Look at the bit patterns for the values of (2 ** n) - 1, i.e. values one less than each power of 2.
2. The same for the values halfway between any two adjacent powers of 2.

Notice any patterns or regularities?

The number columns in the output should really be right-justified, and the repeated (and hence redundant) text in between numbers in the rows should be replaced by a header line at the top, but this time, I've leaving this as an elementary exercise for the reader :)

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers