jugad2 - Vasudev Ram on software innovation: Python-UNIX-utilities

Showing posts with label Python-UNIX-utilities. Show all posts

Monday, April 1, 2019

rmline: Python command-line utility to remove lines from a file [Rosetta Code solution]

- By Vasudev Ram - Online Python training / SQL training / Linux training

Pipeline image attribution

Hi readers,

Long time no post. Sorry.

I saw this programming problem about removing lines from a file on Rosetta Code.

Rosetta Code (Wikipedia) is a programming chrestomathy site.

It's a simple problem, so I thought it would make a good example for Python beginners.

So I wrote a program to solve it. To get the benefits of reuse and composition (at the command line), I wrote it as a Unix-style filter.

Here it is, in file rmline.py:

# Author: Vasudev Ram
# Copyright Vasudev Ram
# Product store:
#    https://gumroad.com/vasudevram
# Training (course outlines and testimonials):
#    https://jugad2.blogspot.com/p/training.html
# Blog:
#    https://jugad2.blogspot.com
# Web site:
#    https://vasudevram.github.io
# Twitter:
#    https://twitter.com/vasudevram

# Problem source:
# https://rosettacode.org/wiki/Remove_lines_from_a_file

from __future__ import print_function
import sys

from error_exit import error_exit

# globals 
sa, lsa = sys.argv, len(sys.argv)

def usage():
    print("Usage: {} start_line num_lines file".format(sa[0]))
    print("Usage: other_command | {} start_line num_lines".format(
    sa[0]))

def main():
    # Check number of args.
    if lsa < 3:
        usage()
        sys.exit(0)

    # Convert number args to ints.
    try:
        start_line = int(sa[1])
        num_lines = int(sa[2])
    except ValueError as ve:
        error_exit("{}: ValueError: {}".format(sa[0], str(ve)))

    # Validate int ranges.
    if start_line < 1:
        error_exit("{}: start_line ({}) must be > 0".format(sa[0], 
        start_line))
    if num_lines < 1:
        error_exit("{}: num_lines ({}) must be > 0".format(sa[0], 
        num_lines))

    # Decide source of input (stdin or file).
    if lsa == 3:
        in_fil = sys.stdin
    else:
        try:
            in_fil = open(sa[3], "r")
        except IOError as ioe:
            error_exit("{}: IOError: {}".format(sa[0], str(ioe)))

    end_line = start_line + num_lines - 1

    # Read input, skip unwanted lines, write others to output.
    for line_num, line in enumerate(in_fil, 1):
        if line_num < start_line:
            sys.stdout.write(line)
        elif line_num > end_line:
            sys.stdout.write(line)

    in_fil.close()

if __name__ == '__main__':
    main()

Here are a few test text files I tried it on:

$ dir f?.txt/b
f0.txt
f5.txt
f20.txt

f0.txt has 0 bytes.
Contents of f5.txt:

$ type f5.txt
line 1
line 2
line 3
line 4
line 5

f20.txt is similar to f5.txt, but with 20 lines.

Here are a few runs of the program, with output:

$ python rmline.py
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines

$ dir | python rmline.py
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines

Both the above runs show that when called with an invalid set of
arguments (none, in this case), it prints a usage message and exits.

$ python rmline.py f0.txt
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines

Same result, except I gave an invalid first (and only) argument, a file name. See the usage() function in the code to know the right order and types of arguments.

$ python rmline.py -3 4 f0.txt
rmline.py: start_line (-3) must be > 0

$ python rmline.py 2 0 f0.txt
rmline.py: num_lines (0) must be > 0

The above two runs shows that it checks for invalid values for the
first two expected integer argyuments, start_line and num_line.

$ python rmline.py 1 2 f0.txt

For an empty input file, as expected, it both removes and prints nothing.

$ python rmline.py 1 2 f5.txt
line 3
line 4
line 5

The above run shows it removing lines 1 through 2 (start_line = 1, num_lines = 2) of the input from the output.

$ python rmline.py 7 4 f5.txt
line 1
line 2
line 3
line 4
line 5

The above run shows that if you give a starting line number larger than the last input line number, it removes no lines of the input.

$ python rmline.py 1 10 f20.txt
line 11
line 12
line 13
line 14
line 15
line 16
line 17
line 18
line 19
line 20

The above run shows it removing the first 10 lines of the input.

$ python rmline.py 6 10 f20.txt
line 1
line 2
line 3
line 4
line 5
line 16
line 17
line 18
line 19
line 20

The above run shows it removing the middle 10 lines of the input.

$ python rmline.py 11 10 f20.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10

The above run shows it removing the last 10 lines of the input.

Read more:

Pipeline (computing)

Redirection (computing)

The image at the top of the post is of a Unix-style pipeline, with standard input (stdin), standard output (stdout) and standard error (stderr) streams of programs, all independently redirectable, and with the standard output of a preceding command piped to the standard input of the succeeding command in the pipeline. Pipelines and I/O redirection are one of the powerful features of the Unix operating system and shell.

Read a brief introduction to those concepts in an article I wrote for IBM developerWorks:

Developing a Linux command-line utility

The above link is to a post about that utility on my blog. For the
actual code for the utility (in C), and for the PDF of the article,
follow the relevant links in the post.

I had originally written the utility for production use for one of the
largest motorcycle manufacturers in the world.

Enjoy.

Wednesday, May 10, 2017

Python utility like the Unix cut command - Part 1 - cut1.py

By Vasudev Ram

Regular readers of my blog might have noticed that I sometimes write (and write about) command line utilities in posts here.

Recently, I thought of implementing a utility like the Unix cut command in Python.

Here is my first cut at it (pun intended :), below.

However, instead of just posting the final code and some text describing it, this time, I had the idea of doing something different in a post (or a series of posts).

I thought it might be interesting to show some of the stages of development of the utility, such as incremental versions of it, with features or code improvements added in each successive version, and a bit of discussion on the design and implementation, and also on the thought processes occurring during the work.

(For beginners: incidentally, speaking of thought processes, during interviews for programming jobs at many companies, open-ended questions are often asked, where you have to talk about your thoughts as you work your way through to a solution to a problem posed. I know this because I've been on both sides of that many times. This interviewing technique helps the interviewers gauge your thought processes, and thereby, helps them decide whether they think you are good at design and problem-solving or not, which helps decide whether you get the job or not.)

One reason for doing this format of post, is just because it can be fun (for me to write about, and hopefully for others to read - I know I myself like to read such posts), and another is because one of my lines of business is training (on Python and other areas), and I've found that beginners sometimes have trouble going from a problem or exercise spec to a working implementation, even if they understand well the syntax of the language features needed to implement a solution. This can happen because 1) there can be many possible solutions to a given programming problem, and 2) the way to break down a problem into smaller pieces (stepwise refinement), that are more amenable to translating into programming statements and constructs, is not always self-evident to beginners. It is a skill one acquires over time, as one keeps on programming for months and years.

So I'm going to do it that way (with more explanation and multiple versions), over the course of a few posts.

For this first post, I'll just describe the rudimentary first version that I implemented, and show the code and the output of a few runs of the program.

The Wikipedia link about Unix cut near the top of this post describes its behavior and command line options.

In this first version, I only implement a subset of those:
- reading from a file (only a single file, and not reading from standard input (stdin))
- only support the -c (for cut by column) option (not the -b (by byte) or -f (by field) options)
- only support one column specification, i.e. -cm-n, not forms like -cm1-n1,m2-n2,...

In subsequent versions, I'll add support for some of the omitted features, and also fix any errors that I find in previous versions, by testing.

I'll call the versions cutN.py, where 1 <= N <= the highest version I implement. So this current post is about cut1.py.

Here is the code for cut1.py:

"""
File: cut1.py
Purpose: A Python tool somewhat similar to the Unix cut command.
Does not try to be exactly the same or implement all the features 
of Unix cut. Created for educational purposes.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

from __future__ import print_function

import sys
from error_exit import error_exit

def usage(args):
    #"Cuts the specified columns from each line of the input.\n", 
    lines = [
    "Print only the specified columns from lines of the input.\n", 
    "Usage: {} -cm-n file\n".format(args[0]), 
    "or: cmd_generating_text | {} -cm-n\n".format(args[0]), 
    "where -c means cut by column and\n", 
    "m-n means select (character) columns m to n\n", 
    "For each line, the selected columns will be\n", 
    "written to standard output. Columns start at 1.\n", 
    ]
    for line in lines:
        sys.stderr.write(line)

def cut(in_fil, start_col, end_col):
    for lin in in_fil:
        print(lin[start_col:end_col])

def main():

    sa, lsa = sys.argv, len(sys.argv)
    # Support only one input file for now.
    # Later extend to support stdin or more than one file.
    if lsa != 3:
        usage(sa)
        sys.exit(1)

    prog_name = sa[0]

    # If first two chars of first arg after script name are not "-c",
    # exit with error.
    if sa[1][:2] != "-c":
        usage(sa)
        error_exit("{}: Expected -c option".format(prog_name))

    # Get the part of first arg after the "-c".
    c_opt_arg = sa[1][2:]
    # Split that on "-".
    c_opt_flds = c_opt_arg.split("-")
    if len(c_opt_flds) != 2:
        error_exit("{}: Expected two field numbers after -c option, like m-n".format(prog_name))

    try:
        start_col = int(c_opt_flds[0])
        end_col = int(c_opt_flds[1])
    except ValueError as ve:
        error_exit("Conversion of either start_col or end_col to int failed".format(
        prog_name))

    if start_col < 1:
        error_exit("Error: start_col ({}) < 1".format(start_col))
    if end_col < 1:
        error_exit("Error: end_col ({}) < 1".format(end_col))
    if end_col < start_col:
        error_exit("Error: end_col < start_col")
    
    try:
        in_fil = open(sa[2], "r")
        cut(in_fil, start_col - 1, end_col)
        in_fil.close()
    except IOError as ioe:
        error_exit("Caught IOError: {}".format(repr(ioe)))

if __name__ == '__main__':
    main()

Here are the outputs of a few runs of cut1.py. I used this text file for the tests.
The line of digits at the top acts like a ruler :) which helps you know what character is at what column:

$ type cut-test-file-01.txt
12345678901234567890123456789012345678901234567890
this is a line with many words in it. how is it.
here is another line which also has many words.
now there is a third line that has some words.
can you believe it, a fourth line exists here.

$ python cut1.py
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c a
cut1.py: Expected two field numbers after -c option, like m-n

$ python cut1.py -c0-0
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c0-0 a
Error: start_col (0) < 1

$ python cut1.py -c1-0 a
Error: end_col (0) < 1

$ python cut1.py -c1-1 a
Caught IOError: IOError(2, 'No such file or directory')

$ python cut1.py -c1-1 cut-test-file-01.txt
1
t
h
n
c

$ python cut1.py -c6-12 cut-test-file-01.txt
6789012
is a li
is anot
here is
ou beli

$ python cut1.py -20-12 cut-test-file-01.txt
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.
cut1.py: Expected -c option

$ python cut1.py -c20-12 cut-test-file-01.txt
Error: end_col < start_col

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers

Share |

Wednesday, April 13, 2016

A quick console ruler in Python

By Vasudev Ram

I've done this ruler program a few times before, in various languages.

Here is an earlier version: Rule the command-line with ruler.py!

This one is a simplified and also slightly enhanced version of the one above.

It generates a simple text-based ruler on the console.

Can be useful for data processing tasks related to fixed-length or variable-length records, CSV files, etc.

With REPS set to 8, it works just right for a console of 80 columns.

Here is the code:

# ruler.py
"""
Program to display a ruler on the console.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram - http://jugad2.blogspot.com
0123456789, concatenated.
Purpose: By running this program, you can use its output as a ruler,
to find the position of your own program's output on the line, or to 
find the positions and lengths of fields in fixed- or variable-length 
records in a text file, fields in CSV files, etc.
"""

REPS = 8

def ruler(sep=' ', reps=REPS):
    for i in range(reps):
        print str(i) + ' ' * 4 + sep + ' ' * 3,
    print '0123456789' * reps

def main():

    # Without divider.
    ruler()

    # With various dividers.
    for sep in '|+!':
        ruler(sep)

if __name__ == '__main__':
    main()

And the output:

$ python ruler.py
0         1         2         3         4         5         6         7         
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0    |    1    |    2    |    3    |    4    |    5    |    6    |    7    |    
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0    +    1    +    2    +    3    +    4    +    5    +    6    +    7    +    
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0    !    1    !    2    !    3    !    4    !    5    !    6    !    7    !    
01234567890123456789012345678901234567890123456789012345678901234567890123456789

You can also import it as a module in your own program:

# test_ruler.py
from ruler import ruler
ruler()
# Code that outputs the data you want to measure 
# lengths or positions of, goes here ...
print 'NAME      AGE  CITY'
ruler()
# ... or here.
print 'SOME ONE   20  LON '
print 'ANOTHER    30  NYC '

$ python test_ruler.py
Output:
0         1         2         3         4         5         6         7         
01234567890123456789012345678901234567890123456789012345678901234567890123456789
NAME      AGE  CITY
0         1         2         3         4         5         6         7         
01234567890123456789012345678901234567890123456789012345678901234567890123456789
SOME ONE   20  LON 
ANOTHER    30  NYC

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new products and services.

My Python posts Subscribe to my blog by email

My ActiveState recipes

Share |

Thursday, April 7, 2016

bsplit - binary file split utility in Python

By Vasudev Ram

Some days ago I had written a post about a Unix-like file split utility that I wrote in Python:

Unix split command in Python

I mentioned that unlike the Unix split, I had written mine to only work on text files, because it might be preferable to do it that way (the "do one thing well" idea). I had also said I could write the binary file split as a separate tool. Here it is - bsplit.py:

import sys
import os

OUTFIL_PREFIX = "out_"

def error_exit(message, code=1):
    sys.stderr.write("Error:\n{}".format(str(message)))
    sys.exit(code)

def err_write(message):
    sys.stderr.write(message)

def make_out_filename(prefix, idx):
    '''Make a filename with a serial number suffix.'''
    return prefix + str(idx).zfill(4)

def bsplit(in_filename, bytes_per_file):
    '''Split the input file in_filename into output files of 
    bytes_per_file bytes each. Last file may have less bytes.'''

    in_fil = open(in_filename, "rb")
    outfil_idx = 1
    out_filename = make_out_filename(OUTFIL_PREFIX, outfil_idx)
    out_fil = open(out_filename, "wb")

    byte_count = tot_byte_count = file_count = 0
    c = in_fil.read(1)

    # Loop over the input and split it into multiple files 
    # of bytes_per_file bytes each (except possibly for the 
    # last file, which may have less bytes.
    while c != '':
        byte_count += 1
        out_fil.write(c)
        # Bump vars; change to next output file.
        if byte_count >= bytes_per_file:
            tot_byte_count += byte_count
            byte_count = 0
            file_count += 1
            out_fil.close()
            outfil_idx += 1
            out_filename = make_out_filename(OUTFIL_PREFIX, outfil_idx)
            out_fil = open(out_filename, "wb")
        c = in_fil.read(1)
    # Clean up.
    in_fil.close()
    if not out_fil.closed:
        out_fil.close()
    if byte_count == 0:
        os.remove(out_filename)
        
def usage():
    err_write(
    "Usage: [ python ] {} in_filename bytes_per_file\n".format(
        sys.argv[0]))
    err_write(
    "splits in_filename into files with bytes_per_file bytes\n".format(
        sys.argv[0]))

def main():

    if len(sys.argv) != 3:
        usage()
        sys.exit(1)

    try:
        # Do some checks on arguments.
        in_filename = sys.argv[1]
        if not os.path.exists(in_filename):
            error_exit(
            "Input file '{}' not found.\n".format(in_filename))
        if os.path.getsize(in_filename) == 0:
            error_exit(
            "Input file '{}' has no data.\n".format(in_filename))
        bytes_per_file = int(sys.argv[2])
        if bytes_per_file <= 0:
            error_exit(
            "bytes_per_file cannot be less than or equal to 0.\n")
        # If all checks pass, split the file.
        bsplit(in_filename, bytes_per_file) 
    except ValueError as ve:
        error_exit(str(ve))
    except IOError as ioe:
        error_exit(str(ioe))
    except Exception as e:
        error_exit(str(e))

if __name__ == '__main__':
    main()

The program takes two command line arguments: - the name of an input file to split - the number of bytes per file, into which to split the input file

I tested bsplit with various combinations of test input files and bytes_per_file values. It worked as expected. But if you find any issues, I'd be interested to know - please leave a comment.

Some other recent posts related to the split / bsplit utilities:

A basic file compare utility in Python

Python one-liner to compare two files (conditions apply)

- Enjoy.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python Posts about xtopdf

My ActiveState recipes

Share |

Tuesday, March 15, 2016

Unix split command in Python

By Vasudev Ram

Recently, there was an HN thread about the implementation (not just use) of text editors. Someone mentioned that some editors, including vim, have problems opening large files. Various people gave workarounds or solutions, including using vim and other ways.

I commented that you can use the Unix command bfs (for big file scanner), if you have it on your system, to open the file read-only and then move around in it, like you can in an editor.

I also said that the Unix commands split and csplit can be used to split a large file into smaller chunks, edit the chunks as needed, and then combine the chunks back into a single file using the cat commmand.

This made me think of writing, just for fun, a simple version [1] of the split command in Python. So I did that, and then tested it some [2]. Seems to be working okay so far.

[1] I have not implemented the full functionality of the POSIX split command, only a subset, for now. May enhance it with a few command-line options, or more functionality, later, e.g. with the ability to split binary files. I've also not implemented the default size of 1000 lines, or the ability to take input from standard input if no filename is specfied. (Both are easy.)

However, I am not sure whether the binary file splitting feature should be a part of split, or should be a separate command, considering the Unix philosophy of doing one thing and doing it well. Binary file splitting seems like it should be a separate task from text file splitting. Maybe it is a matter of opinion.

[2] I tested split.py with various valid and invalid values for the lines_per_file argument (such as -3, -2, -1, 0, 1, 2, 3, 10, 50, 100) on each of these input files:

in_file_0_lines.txt
in_file_1_line.txt
in_file_2_lines.txt
in_file_3_lines.txt
in_file_10_lines.txt
in_file_100_lines.txt

where the meaning of the filenames should be self-explanatory.

Of course, I also checked after each test run, that the output file(s) contained the right data.

(There may still be some bugs, of course. If you find any, I'd appreciate hearing about it.)

Here is the code for split.py:

import sys
import os

OUTFIL_PREFIX = "out_"

def make_out_filename(prefix, idx):
    '''Make a filename with a serial number suffix.'''
    return prefix + str(idx).zfill(4)

def split(in_filename, lines_per_file):
    '''Split the input file in_filename into output files of 
    lines_per_file lines each. Last file may have less lines.'''
    in_fil = open(in_filename, "r")
    outfil_idx = 1
    out_filename = make_out_filename(OUTFIL_PREFIX, outfil_idx)
    out_fil = open(out_filename, "w")
    # Using chain assignment feature of Python.
    line_count = tot_line_count = file_count = 0
    # Loop over the input and split it into multiple files.
    # A text file is an iterable sequence, from Python 2.2,
    # so the for line below works.
    for lin in in_fil:
        # Bump vars; change to next output file.
        if line_count >= lines_per_file:
            tot_line_count += line_count
            line_count = 0
            file_count += 1
            out_fil.close()
            outfil_idx += 1
            out_filename = make_out_filename(OUTFIL_PREFIX, outfil_idx)
            out_fil = open(out_filename, "w")
        line_count += 1
        out_fil.write(lin)
    in_fil.close()
    out_fil.close()
    sys.stderr.write("Output is in file(s) with prefix {}\n".format(OUTFIL_PREFIX))
        
def usage():
    sys.stderr.write(
    "Usage: {} in_filename lines_per_file\n".format(sys.argv[0]))

def main():

    if len(sys.argv) != 3:
        usage()
        sys.exit(1)

    try:
        # Get and validate in_filename.
        in_filename = sys.argv[1]
        # If input file does not exist, exit.
        if not os.path.exists(in_filename):
            sys.stderr.write("Error: Input file '{}' not found.\n".format(in_filename))
            sys.exit(1)
        # If input is empty, exit.
        if os.path.getsize(in_filename) == 0:
            sys.stderr.write("Error: Input file '{}' has no data.\n".format(in_filename))
            sys.exit(1)
        # Get and validate lines_per_file.
        lines_per_file = int(sys.argv[2])
        if lines_per_file <= 0:
            sys.stderr.write("Error: lines_per_file cannot be less than or equal to 0.\n")
            sys.exit(1)
        # If all checks pass, split the file.
        split(in_filename, lines_per_file) 
    except ValueError as ve:
        sys.stderr.write("Caught ValueError: {}\n".format(repr(ve)))
    except IOError as ioe:
        sys.stderr.write("Caught IOError: {}\n".format(repr(ioe)))
    except Exception as e:
        sys.stderr.write("Caught Exception: {}\n".format(repr(e)))
        raise

if __name__ == '__main__':
    main()

You can run split.py like this:

$ python split.py
Usage: split.py in_filename lines_per_file

which will give you the usage help. And like this to actually split text files, in this case, a 100-line text file into 10 files of 10 lines each:

$ python split.py in_file_100_lines.txt 10
Output is in file(s) with prefix out_

Here are a couple of runs with invalid values for either the input file or the lines_per_file argument:

$ python split.py in_file_100_lines.txt 0
Error: lines_per_file cannot be less than or equal to 0.

$ python split.py not-there.txt 0
Error: Input file 'not-there.txt' not found.

As an aside, thinking about whether to use 0 or 1 as initial value for some of the _count variables in the program, made me remember this topic:

Why programmers count from 0

See the first few hits for some good answers.

And finally, speaking of zero, check out this earlier post by me:

Bhaskaracharya and the man who found zero

- Enjoy.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python Posts about xtopdf

My ActiveState recipes

Share |

Friday, March 20, 2015

A simple UNIX-like "which" command in Python

By Vasudev Ram

UNIX users are familiar with the which command. Given an argument called name, it checks the system PATH environment variable, to see whether that name exists (as a file) in any of the directories specified in the PATH. (The directories in the PATH are colon-separated on UNIX and semicolon-separated on Windows.)

I'd written a Windows-specific version of the which command some time ago, in C.

Today I decided to write a simple version of the which command in Python. In the spirit of YAGNI and incremental development, I tried to resist the temptation to add more features too early; but I did give in once and add the exit code stuff near the end :)

Here is the code for which.py:

from __future__ import print_function

# which.py
# A minimal version of the UNIX which utility, in Python.
# Author: Vasudev Ram - www.dancingbison.com
# Copyright 2015 Vasudev Ram - http://www.dancingbison.com

import sys
import os
import os.path
import stat

def usage():
    sys.stderr.write("Usage: python which.py name\n") 
    sys.stderr.write("or: which.py name\n") 

def which(name):
    found = 0 
    for path in os.getenv("PATH").split(os.path.pathsep):
        full_path = path + os.sep + name
        if os.path.exists(full_path):
            """
            if os.stat(full_path).st_mode & stat.S_IXUSR:
                found = 1
                print(full_path)
            """
            found = 1
            print(full_path)
    # Return a UNIX-style exit code so it can be checked by calling scripts.
    # Programming shortcut to toggle the value of found: 1 => 0, 0 => 1.
    sys.exit(1 - found)

def main():
    if len(sys.argv) != 2:
        usage()
        sys.exit(1)
    which(sys.argv[1])

if "__main__" == __name__:
        main()

And here are a few examples of using the command:

(Note: the tests are done on Windows, though the command prompt is a $ sign (UNIX default); I just set it to that because I like $'s and UNIX :)

$ which vim
\vim

$ which vim.exe
C:\Ch\bin\vim.exe

$ set PATH | grep -i vim73

$ addpath c:\vim\vim73

$ which.py vim.exe
C:\Ch\bin\vim.exe

c:\vim\vim73\vim.exe
$ which metapad.exe
C:\util\metapad.exe

$ which pscp.exe
C:\util\pscp.exe
C:\Ch\bin\pscp.exe

$ which dostounix.exe
C:\util\dostounix.exe

$ which pythonw.exe
C:\Python278\pythonw.exe
D:\Anaconda-2.1.0-64\pythonw.exe

# Which which is which? All four combinations:

$ which which
.\which

$ which.py which
.\which

$ which which.py
.\which.py

$ which.py which.py
.\which.py

As you can see, calling the which Python command with different arguments, gives various results, including sometimes finding one instance of vim.exe and sometimes two instances, depending on the values in the PATH variable (which I changed, using my addpath.bat script, to add the \vim\vim73 directory to it).

Also, it works when invoked either as which.py or just which.

I'll discuss my interpretation of these variations in an upcoming post - including a variation that uses os.stat(full_path).st_mode - see the commented part of the code under the line:

if os.path.exists(full_path):

Meanwhile, did you know that YAGNI was written about much before agile was a thing? IIRC, I've seen it described in either Kernighan and Ritchie (The C Programming Language) or in Kernighan and Pike (The UNIX Programming Environment). It could be possibly be older than that, say from the mainframe era.

Finally, as I was adding labels to this blog post, Blogger showed me "pywhich" as a label, after I typed "which" in the labels box. That reminded me that I had written another post earlier about a Python which utility (not by me), so I found it on my blog by typing in this URL:

http://jugad2.blogspot.in/search/label/pywhich

which finds all posts on my blog with the label 'pywhich' (and the same approach works for any other label); the resulting post is:

pywhich, like the Unix which tool, for Python modules.

- Enjoy.

jugad2 - Vasudev Ram on software innovation

Pages

Monday, April 1, 2019

rmline: Python command-line utility to remove lines from a file [Rosetta Code solution]

Wednesday, May 10, 2017

Python utility like the Unix cut command - Part 1 - cut1.py

Wednesday, April 13, 2016

A quick console ruler in Python

Thursday, April 7, 2016

bsplit - binary file split utility in Python

Tuesday, March 15, 2016

Unix split command in Python

Friday, March 20, 2015

A simple UNIX-like "which" command in Python

Blog Archive

Labels