Wednesday, May 10, 2017

Python utility like the Unix cut command - Part 1 - cut1.py

By Vasudev Ram

Regular readers of my blog might have noticed that I sometimes write (and write about) command line utilities in posts here.

Recently, I thought of implementing a utility like the Unix cut command in Python.

Here is my first cut at it (pun intended :), below.

However, instead of just posting the final code and some text describing it, this time, I had the idea of doing something different in a post (or a series of posts).

I thought it might be interesting to show some of the stages of development of the utility, such as incremental versions of it, with features or code improvements added in each successive version, and a bit of discussion on the design and implementation, and also on the thought processes occurring during the work.

(For beginners: incidentally, speaking of thought processes, during interviews for programming jobs at many companies, open-ended questions are often asked, where you have to talk about your thoughts as you work your way through to a solution to a problem posed. I know this because I've been on both sides of that many times. This interviewing technique helps the interviewers gauge your thought processes, and thereby, helps them decide whether they think you are good at design and problem-solving or not, which helps decide whether you get the job or not.)

One reason for doing this format of post, is just because it can be fun (for me to write about, and hopefully for others to read - I know I myself like to read such posts), and another is because one of my lines of business is training (on Python and other areas), and I've found that beginners sometimes have trouble going from a problem or exercise spec to a working implementation, even if they understand well the syntax of the language features needed to implement a solution. This can happen because 1) there can be many possible solutions to a given programming problem, and 2) the way to break down a problem into smaller pieces (stepwise refinement), that are more amenable to translating into programming statements and constructs, is not always self-evident to beginners. It is a skill one acquires over time, as one keeps on programming for months and years.

So I'm going to do it that way (with more explanation and multiple versions), over the course of a few posts.

For this first post, I'll just describe the rudimentary first version that I implemented, and show the code and the output of a few runs of the program.

The Wikipedia link about Unix cut near the top of this post describes its behavior and command line options.

In this first version, I only implement a subset of those:
- reading from a file (only a single file, and not reading from standard input (stdin))
- only support the -c (for cut by column) option (not the -b (by byte) or -f (by field) options)
- only support one column specification, i.e. -cm-n, not forms like -cm1-n1,m2-n2,...

In subsequent versions, I'll add support for some of the omitted features, and also fix any errors that I find in previous versions, by testing.

I'll call the versions cutN.py, where 1 <= N <= the highest version I implement. So this current post is about cut1.py.

Here is the code for cut1.py:
"""
File: cut1.py
Purpose: A Python tool somewhat similar to the Unix cut command.
Does not try to be exactly the same or implement all the features 
of Unix cut. Created for educational purposes.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

from __future__ import print_function

import sys
from error_exit import error_exit

def usage(args):
    #"Cuts the specified columns from each line of the input.\n", 
    lines = [
    "Print only the specified columns from lines of the input.\n", 
    "Usage: {} -cm-n file\n".format(args[0]), 
    "or: cmd_generating_text | {} -cm-n\n".format(args[0]), 
    "where -c means cut by column and\n", 
    "m-n means select (character) columns m to n\n", 
    "For each line, the selected columns will be\n", 
    "written to standard output. Columns start at 1.\n", 
    ]
    for line in lines:
        sys.stderr.write(line)

def cut(in_fil, start_col, end_col):
    for lin in in_fil:
        print(lin[start_col:end_col])

def main():

    sa, lsa = sys.argv, len(sys.argv)
    # Support only one input file for now.
    # Later extend to support stdin or more than one file.
    if lsa != 3:
        usage(sa)
        sys.exit(1)

    prog_name = sa[0]

    # If first two chars of first arg after script name are not "-c",
    # exit with error.
    if sa[1][:2] != "-c":
        usage(sa)
        error_exit("{}: Expected -c option".format(prog_name))

    # Get the part of first arg after the "-c".
    c_opt_arg = sa[1][2:]
    # Split that on "-".
    c_opt_flds = c_opt_arg.split("-")
    if len(c_opt_flds) != 2:
        error_exit("{}: Expected two field numbers after -c option, like m-n".format(prog_name))

    try:
        start_col = int(c_opt_flds[0])
        end_col = int(c_opt_flds[1])
    except ValueError as ve:
        error_exit("Conversion of either start_col or end_col to int failed".format(
        prog_name))

    if start_col < 1:
        error_exit("Error: start_col ({}) < 1".format(start_col))
    if end_col < 1:
        error_exit("Error: end_col ({}) < 1".format(end_col))
    if end_col < start_col:
        error_exit("Error: end_col < start_col")
    
    try:
        in_fil = open(sa[2], "r")
        cut(in_fil, start_col - 1, end_col)
        in_fil.close()
    except IOError as ioe:
        error_exit("Caught IOError: {}".format(repr(ioe)))

if __name__ == '__main__':
    main()
Here are the outputs of a few runs of cut1.py. I used this text file for the tests.
The line of digits at the top acts like a ruler :) which helps you know what character is at what column:
$ type cut-test-file-01.txt
12345678901234567890123456789012345678901234567890
this is a line with many words in it. how is it.
here is another line which also has many words.
now there is a third line that has some words.
can you believe it, a fourth line exists here.
$ python cut1.py
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c a
cut1.py: Expected two field numbers after -c option, like m-n

$ python cut1.py -c0-0
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c0-0 a
Error: start_col (0) < 1

$ python cut1.py -c1-0 a
Error: end_col (0) < 1

$ python cut1.py -c1-1 a
Caught IOError: IOError(2, 'No such file or directory')

$ python cut1.py -c1-1 cut-test-file-01.txt
1
t
h
n
c

$ python cut1.py -c6-12 cut-test-file-01.txt
6789012
is a li
is anot
here is
ou beli

$ python cut1.py -20-12 cut-test-file-01.txt
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.
cut1.py: Expected -c option

$ python cut1.py -c20-12 cut-test-file-01.txt
Error: end_col < start_col
- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers


Tuesday, May 9, 2017

Meet the world's biggest bovine, the gaur

By Vasudev Ram

I was browsing for information about the Gaur (the Indian bison, my mascot :), when I came across this nice video about them:

Meet the world's biggest bovine, the gaur.

The video is also embedded below.



- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Tuesday, May 2, 2017

Interesting tech links of the week

By Vasudev Ram

Talk Python to Me podcast: Episode #107: Python concurrency with Curio - David Beazley

Excerpt:

[ You've heard me go on and on about how Python 3.5's async and await features changed the game for asynchronous programming in Python, but what exactly does that mean? How does that look in the APIs? How does it work internally? Today I'm here with David Beazley (@dabeaz) who's been deeply exploring this space with his project Curio. And that's what this episode of Talk Python To Me is all about. It's episode 107 recorded April 14th 2017. ]

Programming as a Way of Thinking (scientificamerican.com)
HN thread about it

Why Use Postgres? (craigkerstiens.com)

There is no pass by reference in Go (cheney.net)

Building a QNX 7 Desktop (membarrier.wordpress.com)

Ask HN: What do you do while your code compiles?

Write Fast Apps Using Async Python 3.6 and Redis (paxos.com)

Understand Go pointers (cheney.net)

Software Developers after 40, 50 and 60 Who're Still Coding (belitsoft.com)

Ask HN: How did you acquire your first 100 users?

Ask HN: How did you grow from 100 to 1,000 users?

Six programming paradigms that will change how you think about coding

The Joy of Concatenative Languages Part 1


- Vasudev Ram - Python, Unix and database training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Sunday, April 30, 2017

glot.io, an open source pastebin with runnable snippets

By Vasudev Ram

> glot.io

I came across this site today via the Net:

glot.io, an open source pastebin with runnable snippets and API.

It allows you to type or paste in snippets of code and then run them on the site.

glot.io supports 36 languages at the time of checking it, from Assembly to TypeScript and many others in between.

I tried it out with a simple Python snippet. It worked. You can see both the snippet and the start of the output in the screenshot below.


I had blogged about these roughly similar sites earlier:

Codingbat, Progress Graphs and Michael Jordan

repl.it, online REPL for many languages, and empythoned

Online Python Tutor looks quite interesting


- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Thursday, April 27, 2017

Using nested conditional expressions to classify characters

By Vasudev Ram


While writing some Python code, I happened to use a conditional expression, a Python language feature.

Conditional expressions are expressions (not statements) that have if/else clauses inside them, and they evaluate to either one of two values (in the basic case), depending on the value of a boolean condition. For example:
for n in range(4):
    print n, 'is odd' if n % 2 == 1 else 'is even'
0 is even
1 is odd
2 is even
3 is odd
Here, the conditional expression is this part of the print statement above:
'is odd' if n % 2 == 1 else 'is even'
This expression evaluates to 'is odd' if the condition after the if is True, and evaluates to 'is even' otherwise. So it evaluates to a string in either case, and that string gets printed (after the value of n).

Excerpt from the section about conditional expressions in the Python Language Reference:

[
conditional_expression ::= or_test ["if" or_test "else" expression]
expression ::= conditional_expression | lambda_expr

Conditional expressions (sometimes called a “ternary operator”) have the lowest priority of all Python operations.

The expression x if C else y first evaluates the condition, C (not x); if C is true, x is evaluated and its value is returned; otherwise, y is evaluated and its value is returned.
]

You can see that the definition of conditional_expression is recursive, since it is partly defined in terms of itself (via the definition of expression).

This implies that you can have recursive or nested conditional expressions.

Also, since the syntax of the Python return statement is:
return [ expression_list ]
(where expression_list means one or more expressions, separated by commas, it follows that we can use a nested conditional expression in a return statement (because a nested conditional expresssion is an expression).

Here is a small program to demonstrate that:
'''
File: return_with_nested_cond_exprs.py 
Purpose: Demonstrate nested conditional expressions used in a return statement, 
to classify letters in a string as lowercase, uppercase or neither.
Also demonstrates doing the same task without a function and a return, 
using a lambda and map instead.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
'''

from __future__ import print_function
from string import lowercase, uppercase

# Use return with nested conditional expressions inside a function, 
# to classify characters in a string as lowercase, uppercase or neither:
def classify_char(ch):
    return ch + ': ' + ('lowercase' if ch in lowercase else \
    'uppercase' if ch in uppercase else 'neither')

print("Classify using a function:")
for ch in 'AaBbCc12+-':
    print(classify_char(ch))

print()

# Do it using map and lambda instead of def and for:
print("Classify using map and lambda:")

print('\n'.join(map(lambda ch: ch + ': ' + ('lowercase' if ch in lowercase else 
'uppercase' if ch in uppercase else 'neither'), 'AaBbCc12+-')))
Running it with:
$ python return_with_nested_cond_exprs.py
gives this output:
Classify using a function:
A: uppercase
a: lowercase
B: uppercase
b: lowercase
C: uppercase
c: lowercase
1: neither
2: neither
+: neither
-: neither

Classify using map and lambda:
A: uppercase
a: lowercase
B: uppercase
b: lowercase
C: uppercase
c: lowercase
1: neither
2: neither
+: neither
-: neither
As you can see from the code and the output, I also used that same nested conditional expression in a lambda function, along with map, to do the same task in a more functional style
.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers