Showing posts with label command-line-utilities. Show all posts
Showing posts with label command-line-utilities. Show all posts

Monday, April 1, 2019

rmline: Python command-line utility to remove lines from a file [Rosetta Code solution]



- By Vasudev Ram - Online Python training / SQL training / Linux training



Pipeline image attribution

Hi readers,

Long time no post. Sorry.

I saw this programming problem about removing lines from a file on Rosetta Code.

Rosetta Code (Wikipedia) is a programming chrestomathy site.

It's a simple problem, so I thought it would make a good example for Python beginners.

So I wrote a program to solve it. To get the benefits of reuse and composition (at the command line), I wrote it as a Unix-style filter.

Here it is, in file rmline.py:
# Author: Vasudev Ram
# Copyright Vasudev Ram
# Product store:
#    https://gumroad.com/vasudevram
# Training (course outlines and testimonials):
#    https://jugad2.blogspot.com/p/training.html
# Blog:
#    https://jugad2.blogspot.com
# Web site:
#    https://vasudevram.github.io
# Twitter:
#    https://twitter.com/vasudevram

# Problem source:
# https://rosettacode.org/wiki/Remove_lines_from_a_file

from __future__ import print_function
import sys

from error_exit import error_exit

# globals 
sa, lsa = sys.argv, len(sys.argv)

def usage():
    print("Usage: {} start_line num_lines file".format(sa[0]))
    print("Usage: other_command | {} start_line num_lines".format(
    sa[0]))

def main():
    # Check number of args.
    if lsa < 3:
        usage()
        sys.exit(0)

    # Convert number args to ints.
    try:
        start_line = int(sa[1])
        num_lines = int(sa[2])
    except ValueError as ve:
        error_exit("{}: ValueError: {}".format(sa[0], str(ve)))

    # Validate int ranges.
    if start_line < 1:
        error_exit("{}: start_line ({}) must be > 0".format(sa[0], 
        start_line))
    if num_lines < 1:
        error_exit("{}: num_lines ({}) must be > 0".format(sa[0], 
        num_lines))

    # Decide source of input (stdin or file).
    if lsa == 3:
        in_fil = sys.stdin
    else:
        try:
            in_fil = open(sa[3], "r")
        except IOError as ioe:
            error_exit("{}: IOError: {}".format(sa[0], str(ioe)))

    end_line = start_line + num_lines - 1

    # Read input, skip unwanted lines, write others to output.
    for line_num, line in enumerate(in_fil, 1):
        if line_num < start_line:
            sys.stdout.write(line)
        elif line_num > end_line:
            sys.stdout.write(line)

    in_fil.close()

if __name__ == '__main__':
    main()

Here are a few test text files I tried it on:
$ dir f?.txt/b
f0.txt
f5.txt
f20.txt
f0.txt has 0 bytes.
Contents of f5.txt:
$ type f5.txt
line 1
line 2
line 3
line 4
line 5
f20.txt is similar to f5.txt, but with 20 lines.

Here are a few runs of the program, with output:
$ python rmline.py
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines

$ dir | python rmline.py
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines
Both the above runs show that when called with an invalid set of
arguments (none, in this case), it prints a usage message and exits.
$ python rmline.py f0.txt
Usage: rmline.py start_line num_lines file
Usage: other_command | rmline.py start_line num_lines
Same result, except I gave an invalid first (and only) argument, a file name. See the usage() function in the code to know the right order and types of arguments.
$ python rmline.py -3 4 f0.txt
rmline.py: start_line (-3) must be > 0

$ python rmline.py 2 0 f0.txt
rmline.py: num_lines (0) must be > 0
The above two runs shows that it checks for invalid values for the
first two expected integer argyuments, start_line and num_line.
$ python rmline.py 1 2 f0.txt
For an empty input file, as expected, it both removes and prints nothing.
$ python rmline.py 1 2 f5.txt
line 3
line 4
line 5
The above run shows it removing lines 1 through 2 (start_line = 1, num_lines = 2) of the input from the output.
$ python rmline.py 7 4 f5.txt
line 1
line 2
line 3
line 4
line 5
The above run shows that if you give a starting line number larger than the last input line number, it removes no lines of the input.
$ python rmline.py 1 10 f20.txt
line 11
line 12
line 13
line 14
line 15
line 16
line 17
line 18
line 19
line 20
The above run shows it removing the first 10 lines of the input.
$ python rmline.py 6 10 f20.txt
line 1
line 2
line 3
line 4
line 5
line 16
line 17
line 18
line 19
line 20
The above run shows it removing the middle 10 lines of the input.
$ python rmline.py 11 10 f20.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
The above run shows it removing the last 10 lines of the input.

Read more:

Pipeline (computing)

Redirection (computing)

The image at the top of the post is of a Unix-style pipeline, with standard input (stdin), standard output (stdout) and standard error (stderr) streams of programs, all independently redirectable, and with the standard output of a preceding command piped to the standard input of the succeeding command in the pipeline. Pipelines and I/O redirection are one of the powerful features of the Unix operating system and shell.

Read a brief introduction to those concepts in an article I wrote for IBM developerWorks:

Developing a Linux command-line utility

The above link is to a post about that utility on my blog. For the
actual code for the utility (in C), and for the PDF of the article,
follow the relevant links in the post.

I had originally written the utility for production use for one of the
largest motorcycle manufacturers in the world.

Enjoy.


Saturday, May 5, 2018

A Python version of the Linux watch command

By Vasudev Ram



Watcher image attribution: Yours truly

Hi readers,

[ Update: A note to those reading this post via Planet Python or other aggregators:

Before first pulbishing it, I reviewed the post in Blogger's preview mode, and it appeared okay, regarding the use of the less-than character, so I did not escape it. I did not know (or did not remember) that Planet Python's behavior may be different. As a result, the code had appeared without the less-than signs in the Planet, thereby garbling it. After noticing this, I fixed the issue in the post. Apologies to those seeing the post twice as a result. ]


I was browsing Linux command man pages (section 1) for some work, and saw the page for an interesting command called watch. I had not come across it before. So I read the watch man page, and after understanding how it works (it's pretty straightforward [1]), thought of creating a Python version of it. I have not tried to implement exactly the same functionality as watch, though, just something similar to it. I called the program watch.py.

[1] The one-line description of the watch command is:

watch - execute a program periodically, showing output fullscreen

How watch.py works:

It is a command-line Python program. It takes an interval argument (in seconds), followed by a command with optional arguments. It runs the command with those arguments, repeatedly, at that interval. (The Linux watch command has a few more options, but I chose not to implement those in this version. I may add some of them [2], and maybe some other features that I thought of, in a future version.)

[2] For example, the -t, -b and -e options should be easy to implement. The -p (--precise) option is interesting. The idea here is that there is always some time "drift" [3] when trying to run a command periodically at some interval, due to unpredictable and variable overhead of other running processes, OS scheduling overhead, and so on. I had experienced this issue earlier when I wrote a program that I called pinger.sh, at a large company where I worked earlier.

[3] You can observe the time drift in the output of the runs of the watch.py program, shown below its code below. Compare the interval with the time shown for successive runs of the same command.

I had written it at the request of some sysadmin friends there, who wanted a tool like that to monitor the uptime of multiple Unix servers on the company network. So I wrote the tool, using a combination of Unix shell, Perl and C. They later told me that it was useful, and they used it to monitor the uptime of multiple servers of the company in different cities. The C part was where the more interesting stuff was, since I used C to write a program (used in the overall shell script) that sort of tried to compensate for the time drift, by doing some calculations about remaining time left, and sleeping for those intervals. It worked somewhat okay, in that it reduced the drift a good amount. I don't remember the exact logic I used for it right now, but do remember finding out later, that the gettimeofday function might have been usable in place of the custom code I wrote to solve the issue. Good fun. I later published the utility and a description of it in the company's Knowledge Management System.

Anyway, back to watch.py: each time, it first prints a header line with the interval, the command string (truncated if needed), and the current date and time, followed by some initial lines of the output of that command (this is what "watching" the command means). It does this by creating a pipe with the command, using subprocess.Popen and then reading the standard output of the command, and printing the first num_lines lines, where num_lines is an argument to the watch() function in the program.

The screen is cleared with "clear" for Linux and "cls" for Windows. Using "echo ^L" instead of "clear" works on some Linux systems, so changing the clear screen command to that may make the program a little faster, on systems where echo is a shell built-in, since there will be no need to load the clear command into memory each time [4]. (As a small aside, on earlier Unix systems I've worked on, on which there was sometimes no clear command (or it was not installed), as a workaround, I used to write a small C program that printed 25 newlines to the screen, and compile and install that as a command called clear or cls :)

[4] Although, on recent Windows and Linux systems, after a program is run once, if you run it multiple times a short while later, I've noticed that the startup time is faster from the second time onwards. I guess this is because the OS loads the program code into a memory cache in some way, and runs it from there for the later times it is called. Not sure if this is the same as the OS buffer cache, which I think is only for data. I don't know if there is a standard name for this technique. I've noticed for sure, that when running Python programs, for example, the first time you run:

python some_script_name.py

it takes a bit of time - maybe a second or three, but after the first time, it starts up faster. Of course this speedup disappears when you run the same program after a bigger gap, say the next day, or after a reboot. Presumably this is because that program cache has been cleared.

Here is the code for watch.py.
"""
------------------------------------------------------------------
File: watch.py
Version: 0.1
Purpose: To work somewhat like the Linux watch command.
See: http://man7.org/linux/man-pages/man1/watch.1.html
Does not try to replicate its functionality exactly.

Author: Vasudev Ram
Copyright 2018 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Twitter: https://mobile.twitter.com/vasudevram
------------------------------------------------------------------
"""

from __future__ import print_function

import sys
import os
from subprocess import Popen, PIPE
import time

from error_exit import error_exit

# Assuming 25-line terminal. Adjust if different.
# If on Unix / Linux, can get value of environment variable 
# COLUMNS (if defined) and use that instead of 80.
DEFAULT_NUM_LINES = 20

def usage(args):
    lines = [
        "Usage: python {} interval command [ argument ... ]".format(
            args[0]),
        "Run command with the given arguments every interval seconds,",
        "and show some initial lines from command's standard output.",
        "Clear screen before each run.",
    ]
    for line in lines:
        sys.stderr.write(line + '\n')

def watch(command, interval, num_lines):
    # Truncate command for display in the header of watch output.
    if len(command) > 50:
        command_str = command[:50] + "..."
    else:
        command_str = command
    hdr_part_1 = "Every {}s: {} ".format(interval, command_str)
    # Assuming 80 columns terminal width. Adjust if different.
    # If on Unix / Linux, can get value of environment variable 
    # COLUMNS (if defined) and use that instead of 80.
    columns = 80
    # Compute pad_len only once, before the loop, because 
    # neither len(hdr_part_1) nor len(hdr_part_2) change, 
    # even though hdr_part_2 is recomputed each time in the loop.
    hdr_part_2 = time.asctime()
    pad_len = columns - len(hdr_part_1) - len(hdr_part_2) - 1
    while True:
        # Clear screen based on OS platform.
        if "win" in sys.platform:
            os.system("cls")
        elif "linux" in sys.platform: 
            os.system("clear")
        hdr_str = hdr_part_1 + (" " * pad_len) + hdr_part_2
        print(hdr_str + "\n")
        # Run the command, read and print its output up to num_lines lines.
        # os.popen is the old deprecated way, Python docs recommend to use 
        # subprocess.Popen.
        #with os.popen(command) as pipe:
        with Popen(command, shell=True, stdout=PIPE).stdout as pipe:
            for line_num, line in enumerate(pipe):
                print(line, end='')
                if line_num >= num_lines:
                    break
        time.sleep(interval)
        hdr_part_2 = time.asctime()

def main():

    sa, lsa = sys.argv, len(sys.argv)

    # Check arguments and exit if invalid.
    if lsa < 3:
        usage(sa)
        error_exit(
        "At least two arguments are needed: interval and command;\n"
        "optional arguments can be given following command.\n")

    try:
        # Get the interval argument as an int.
        interval = int(sa[1])
        if interval < 1:
            error_exit("{}: Invalid interval value: {}".format(sa[0],
                interval))
        # Build the command to run from the remaining arguments.
        command = " ".join(sa[2:])
        # Run the command repeatedly at the given interval.
        watch(command, interval, DEFAULT_NUM_LINES)
    except ValueError as ve:
        error_exit("{}: Caught ValueError: {}".format(sa[0], str(ve)))
    except OSError as ose:
        error_exit("{}: Caught OSError: {}".format(sa[0], str(ose)))
    except Exception as e:
        error_exit("{}: Caught Exception: {}".format(sa[0], str(e)))

if __name__ == "__main__":
    main()
Here is the code for error_exit.py, which watch imports.
# error_exit.py

# Author: Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

# Purpose: This module, error_exit.py, defines a function with 
# the same name, error_exit(), which takes a string message 
# as an argument. It prints the message to sys.stderr, or 
# to another file object open for writing (if given as the 
# second argument), and then exits the program.
# The function error_exit can be used when a fatal error condition occurs, 
# and you therefore want to print an error message and exit your program.

import sys

def error_exit(message, dest=sys.stderr):
    dest.write(message)
    sys.exit(1)

def main():
    error_exit("Testing error_exit with dest sys.stderr (default).\n")
    error_exit("Testing error_exit with dest sys.stdout.\n", 
        sys.stdout)
    with open("temp1.txt", "w") as fil:
        error_exit("Testing error_exit with dest temp1.txt.\n", fil)

if __name__ == "__main__":
    main()
Here are some runs of watch.py and their output:
(BTW, the dfs command shown, is from the Quick-and-dirty disk free space checker for Windows post that I had written recently.)

$ python watch.py 15 ping google.com

Every 15s: ping google.com                             Fri May 04 21:15:56 2018

Pinging google.com [2404:6800:4007:80d::200e] with 32 bytes of data:
Reply from 2404:6800:4007:80d::200e: time=117ms
Reply from 2404:6800:4007:80d::200e: time=109ms
Reply from 2404:6800:4007:80d::200e: time=117ms
Reply from 2404:6800:4007:80d::200e: time=137ms

Ping statistics for 2404:6800:4007:80d::200e:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 109ms, Maximum = 137ms, Average = 120ms

Every 15s: ping google.com                             Fri May 04 21:16:14 2018

Pinging google.com [2404:6800:4007:80d::200e] with 32 bytes of data:
Reply from 2404:6800:4007:80d::200e: time=501ms
Reply from 2404:6800:4007:80d::200e: time=56ms
Reply from 2404:6800:4007:80d::200e: time=105ms
Reply from 2404:6800:4007:80d::200e: time=125ms

Ping statistics for 2404:6800:4007:80d::200e:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 56ms, Maximum = 501ms, Average = 196ms

Every 15s: ping google.com                             Fri May 04 21:16:33 2018

Pinging google.com [2404:6800:4007:80d::200e] with 32 bytes of data:
Reply from 2404:6800:4007:80d::200e: time=189ms
Reply from 2404:6800:4007:80d::200e: time=141ms
Reply from 2404:6800:4007:80d::200e: time=245ms
Reply from 2404:6800:4007:80d::200e: time=268ms

Ping statistics for 2404:6800:4007:80d::200e:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 141ms, Maximum = 268ms, Average = 210ms

$ python watch.py 15 c:\ch\bin\date

Every 15s: c:\ch\bin\date                              Tue May 01 00:33:00 2018

Tue May  1 00:33:00 India Standard Time 2018

Every 15s: c:\ch\bin\date                              Tue May 01 00:33:15 2018

Tue May  1 00:33:16 India Standard Time 2018

Every 15s: c:\ch\bin\date                              Tue May 01 00:33:31 2018

Tue May  1 00:33:31 India Standard Time 2018

Every 15s: c:\ch\bin\date                              Tue May 01 00:33:46 2018

Tue May  1 00:33:47 India Standard Time 2018

In one CMD window:

$ d:\temp\fill-and-free-disk-space

In another:

$ python watch.py 10 dfs d:\

Every 10s: dfs d:\                                     Tue May 01 00:43:25 2018
Disk free space on d:\
37666.6 MiB = 36.78 GiB

Every 10s: dfs d:\                                     Tue May 01 00:43:35 2018
Disk free space on d:\
37113.7 MiB = 36.24 GiB

$ python watch.py 20 dir /b "|" sort

Every 20s: dir /b | sort                               Fri May 04 21:29:41 2018

README.txt
runner.py
watch-outputs.txt
watch-outputs2.txt
watch.py
watchnew.py

$ python watch.py 10 ping com.nosuchsite

Every 10s: ping com.nosuchsite                         Fri May 04 21:30:49 2018

Ping request could not find host com.nosuchsite. Please check the name and try again.

$ python watch.py 20 dir z:\

Every 20s: dir z:\                                     Tue May 01 00:54:37 2018
The system cannot find the path specified.

$ python watch.py 2b echo testing
watch.py: Caught ValueError: invalid literal for int() with base 10: '2b'

$ python watch.py 20 foo

Every 20s: foo                                         Fri May 04 21:33:35 2018

'foo' is not recognized as an internal or external command,
operable program or batch file.

$ python watch.py -1 foo
watch.py: Invalid interval value: -1
- Enjoy.

Interested in a Python programming or Linux commands and shell scripting course? I have good experience built over many years of real-life experience, as well as teaching, in both those subject areas. Contact me for course details via my contact page here.

- Vasudev Ram - Online Python training and consulting

Fast web hosting with A2 Hosting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Sunday, April 15, 2018

Quick-and-dirty disk free space checker for Windows

By Vasudev Ram


'I mean, if 10 years from now, when you are doing something quick and dirty, you suddenly visualize that I am looking over your shoulders and say to yourself "Dijkstra would not have liked this", well, that would be enough immortality for me.'

Dijkstra quote attribution

Hi readers,

[ This is the follow-up post that I said I would do after this previous post: Quick-and-clean disk usage utility in Python. This follow-up post describes the quick-and-dirty version of the disk space utility, which is the one I wrote first, before the quick-and-clean version linked above. Note that the two utilities do not give the exact same output - the clean one gives more information. Compare the outputs to see the difference. ]

I had a need to periodically check the free space on my disks in Windows. So I thought of semi-automating the process and came up with this quick-and-dirty utility for it. It used the DOS DIR command, a grep utility for Windows, and a simple Python script, all together in a pipeline, with the Python script processing the results provided by the previous two.

I will first show the Python script and then show its usage in a command pipeline together with the DIR command and a grep command. Then will briefly discuss other possible ways of doing this same task.

Here is the Python script, disk_free_space.py:

from __future__ import print_function
import sys

# Author: Vasudev Ram
# Copyright 2018 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram
# Software mentoring: https://www.codementor.io/vasudevram

#for line in sys.stdin:

# The first readline (below) is to read and throw away the line with 
# just "STDIN" in it. We do this because the grep tool that is used 
# before this program in the pipeline (see dfs.bat below), adds a line 
# with "STDIN" before the real grep output.
# Another alternative is to use another grep which does not do that; 
# in that case, delete the first readline statement.
line = sys.stdin.readline()

# The second readline (below) gets the line we want, with the free space in bytes.
line = sys.stdin.readline()

if line.endswith("bytes free\n"):
    words = line.split()
    bytes_free_with_commas = words[2]
    try:
        free_space_mb = int(bytes_free_with_commas.replace(
            ",", "")) / 1024.0 / 1024.0
        free_space_gb = free_space_mb / 1024.0 
        print("{:.1f} MiB = {:.2f} GiB".format(
            free_space_mb, free_space_gb))
    except ValueError as ve:
        sys.stdout.write("{}: Caught ValueError: {}\n".format(
            sys.argv[0], str(ve)))
    #break

An alternative method is to remove the first readline call above, and un-comment the for loop line at the top, and the break statement at the bottom. In that approach, the program will loop over all the lines of stdin, but skip processing all of them except for the single line we want, the one that has the pattern "bytes free". This is actually an extra level of checking that mostly will not be needed, since the grep preceding this program in the pipeline, should filter out all lines except for the one we want.

For why I used MiB and GiB units instead of MB and GB, refer to this article Wikipedia article: Mebibyte

Once we have the above program, we call it from the pipeline, which I have wrapped in this batch file, dfs.bat, for convenience, to get the end result we want:
@echo off
echo Disk free space on %1
dir %1 | grep "bytes free" | python c:\util\disk_free_space.py

Here is a run of dfs.bat to get disk free space information for drive D:\ :
$ dfs d:\
Disk free space on d:\
40103.0 MiB = 39.16 GiB
You can run dfs for both C: and D: in one single command like this:
$ dfs c:\ & dfs d:\
(It uses the Windows CMD operator & which means run the command to the left of the ampersand, then run the command to the right.)

Another way of doing the same task as this utility, is to use the Python psutil library. That way is shown in the quick-and-clean utility post linked near the top of this post. That way would be cross-platform, at least between Windows and Linux, as shown in that post. The only small drawback is that you have to install psutil for it to work, whereas this utility does not need it. This one does need a grep, of course.

Yet another way could be to use lower-level Windows file system APIs directly, to get the needed information. In fact, that is probably how psutil does it. I have not looked into that approach yet, but it might be interesting to do so. Might have to use techniques of calling C or C++ code from Python, like ctypes, SWIG or cffi for that, since those Windows APIs are probably written in C or C++. Check out this post for a very simple example on those lines:

Calling C from Python with ctypes

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get fast reliable hosting with A2Hosting.com

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Sunday, April 8, 2018

Quick-and-clean disk usage utility in Python

By Vasudev Ram


Hard disk image

Hard disk image attribution

Hi readers,

Recently, I thought that I should check the disk space on my PC more often, possibly because of having installed a lot of software on it over a period. As you know, these days, many software apps take up a lot of disk space, sometimes in the range of a gigabyte or more for one app. So I wanted a way to check more frequently whether my disks are close to getting full.

I thought of creating a quick-and-dirty disk free space checker tool in Python, to partially automate this task. Worked out how to do it, and wrote it - initially for Windows only. I called it disk_free_space.py. Ran it to check the disk free space on a few of my disk partitions, and it worked as intended.

Then I slapped my forehead as I realized that I could do it in a cleaner as well as more cross-platform way, using the psutil library, which I knew and had used earlier.

So I wrote another version of the tool using psutil, that I called disk_usage.py.

Here is the code for disk_usage.py:

#----------------------------------------------------------------------
#
# disk_usage.py
#
# Author: Vasudev Ram
# Copyright 2018 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram
# Software mentoring: https://www.codementor.io/vasudevram
#
# Description: A Python app to show disk usage.
# Usage: python disk_usage.py path
#
# For the path given as command-line argument, it shows 
# the percentage of space used, and the total, used and 
# free space, in both MiB and GiB. For definitions of
# MiB vs. MB and GiB vs. GB, see:
# https://en.wikipedia.org/wiki/Mebibyte
#
# Requires: The psutil module, see:
# https://psutil.readthedocs.io/
#
#----------------------------------------------------------------------

from __future__ import print_function
import sys
import psutil

BYTES_PER_MIB = 1024.0 * 1024.0

def disk_usage_in_mib(path):
    """ Return disk usage data in MiB. """
    # Here percent means percent used, not percent free.
    total, used, free, percent = psutil.disk_usage(path)
    # psutil returns usage data in bytes, so convert to MiB.
    return total/BYTES_PER_MIB, used/BYTES_PER_MIB, \
    free/BYTES_PER_MIB, percent

def main():
    if len(sys.argv) == 1:
        print("Usage: python {} path".format(sys.argv[0]))
        print("Shows the disk usage for the given path (file system).")
        sys.exit(0)
    path = sys.argv[1]
    try:
        # Get disk usage data.
        total_mib, used_mib, free_mib, percent = disk_usage_in_mib(path)
        # Print disk usage data.
        print("Disk Usage for {} - {:.1f} percent used. ".format( \
        path, percent))
        print("In MiB: {:.0f} total; {:.0f} used; {:.0f} free.".format(
            total_mib, used_mib, free_mib))
        print("In GiB: {:.3f} total; {:.3f} used; {:.3f} free.".format(
            total_mib/1024.0, used_mib/1024.0, free_mib/1024.0))
    except OSError as ose:
        sys.stdout.write("{}: Caught OSError: {}\n".format(
            sys.argv[0], str(ose)))
    except Exception as e:
        sys.stdout.write("{}: Caught Exception: {}\n".format(
            sys.argv[0], str(e)))

if __name__ == '__main__':
    main()

Here is the output from running it a few times:

On Linux:
$ df -BM -h /
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/precise32-root   79G  5.2G   70G   7% /

$ python disk_usage.py /
Disk Usage for / - 6.8 percent used.
In MiB: 80773 total; 5256 used; 71472 free.
In GiB: 78.880 total; 5.132 used; 69.797 free.

$ df -BM -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       228M   24M  192M  12% /boot

$ python disk_usage.py /boot
Disk Usage for /boot - 11.1 percent used.
In MiB: 228 total; 24 used; 192 free.
In GiB: 0.222 total; 0.023 used; 0.187 free.

On Windows:
$ python disk_usage.py d:\
Disk Usage for d:\ - 59.7 percent used.
In MiB: 100000 total; 59667 used; 40333 free.
In GiB: 97.656 total; 58.268 used; 39.388 free.

$ python disk_usage.py h:\
Disk Usage for h:\ - 28.4 percent used.
In MiB: 100 total; 28 used; 72 free.
In GiB: 0.098 total; 0.028 used; 0.070 free.

I had to tweak the df command invocation to be as you see it above, to make the results of my program and those of df to match. This is because of the difference in calculating MB vs. MiB and GB vs. GiB - see Wikipedia link in header comment of my program above, if you do not know the differences.

So this program using psutil is both cleaner and more cross-platform than my original quick-and-dirty one which was only for Windows, but which did not need psutil installed. Pros and cons for both. I will show the latter program in a following post.

The image at the top of the post is of "a newer 2.5-inch (63.5 mm) 6,495 MB HDD compared to an older 5.25-inch full-height 110 MB HDD".

I've worked some years earlier in system engineer roles where I encountered such older models of hard disks, and also had good experiences and learning in solving problems related to them, mainly on Unix machines, including sometimes using Unix commands and tricks of the trade that I learned or discovered, to recover data from systems where the machine or the hard disk had crashed, and of course, often without backups available. Here is one such anecdote, which I later wrote up and published as an article for Linux For You magazine (now called Open Source For You):

How Knoppix saved the day.

Talk of Murphy's Law ...

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get fast reliable hosting with A2Hosting.com

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Saturday, March 31, 2018

Checking if web sites are online with Python

By Vasudev Ram

Hi readers,

Recently, I thought of writing a small program to check if one or more web sites are online or not. I used the requests Python library with the HTTP HEAD method. I also checked out PycURL for this. It is a thin wrapper over libcurl, the library that powers the well-known and widely used curl command line tool. While PycURL looks powerful and fast (since it is a thin wrapper that exposes most or all of the functionality of libcurl), I decided to use requests for this version of the program. The code for the program is straightforward, but I found a few interesting things while running it with a few different sites as arguments. I mention those points below.

Here is the tool: I named it is_site_online.py:

"""
is_site_online.py
Purpose: A Python program to check if a site is online or not.
Uses the requests library and the HTTP HEAD method.
Tries both with and without HTTP redirects.
Author: Vasudev Ram
Copyright 2018 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

from __future__ import print_function
import sys
import requests
import time

if len(sys.argv) < 2:
    sys.stderr.write("Usage: {} site ...".format(sys.argv[0]))
    sys.stderr.write("Checks if the given site(s) are online or not.")
    sys.exit(0)

print("Checking if these sites are online or not:")
print("   ".join(sys.argv[1:]))

print("-" * 60)
try:
    for site in sys.argv[1:]:
        for allow_redirects in (False, True):
            tc1 = time.clock()
            r = requests.head(site, allow_redirects=allow_redirects)
            tc2 = time.clock()
            print("Site:", site)
            print("Check with allow_redirects =", allow_redirects)
            print("Results:")
            print("r.ok:", r.ok)
            print("r.status_code:", r.status_code)
            print("request time:", round(tc2 - tc1, 3), "secs")
            print("-" * 60)
except requests.ConnectionError as ce:
    print("Error: ConnectionError: {}".format(ce))
    sys.exit(1)
except requests.exceptions.MissingSchema as ms:
    print("Error: MissingSchema: {}".format(ms))
    sys.exit(1)
except Exception as e:
    print("Error: Exception: {}".format(e))
    sys.exit(1)
The results of some runs of the program:

Check for Google and Yahoo!:

$ python is_site_online.py http://google.com http://yahoo.com
Checking if these sites are online or not:
http://google.com   http://yahoo.com
-----------------------------------------------------------
Site: http://google.com
Check with allow_redirects = False
Results:
r.ok: True
r.status_code: 302
request time: 0.217 secs
------------------------------------------------------------
Site: http://google.com
Check with allow_redirects = True
Results:
r.ok: True
r.status_code: 200
request time: 0.36 secs
------------------------------------------------------------
Site: http://yahoo.com
Check with allow_redirects = False
Results:
r.ok: True
r.status_code: 301
request time: 2.837 secs
------------------------------------------------------------
Site: http://yahoo.com
Check with allow_redirects = True
Results:
r.ok: True
r.status_code: 200
request time: 1.852 secs
------------------------------------------------------------
In the cases where allow_redirects is False, google.com gives a status code of 302 and yahoo.com gives a status code of 301. The 3xx series of codes are related to HTTP redirection.

After seeing this, I looked up HTTP status code information in a few sites such as Wikipedia and the official site www.w3.org (the World Wide Web Consortium), and found a point worth noting. See the part in the Related links section at the end of this post about "302 Found", where it says: "This is an example of industry practice contradicting the standard.".

Now let's check for some error cases:

One error case: we do not give an http:// prefix (assume some novice user who is mixed up about schemes and paths), so they type a garbled site name, say http.om:

$ python is_site_online.py http.om
Checking if these sites are online or not:
http.om
------------------------------------------------------------
Traceback (most recent call last):
  File "is_site_online.py", line 32, in 
    r = requests.head(site, allow_redirects=allow_redirects)
[snip long traceback]
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'http.om':
No schema supplied. Perhaps you meant http://http.om?
This traceback tells us that when no HTTP 'scheme' [1][2] is given, requests raises a MissingSchema exception. So we now know that we need to catch that exception in our code, by adding another except clause to the try statement, which I later did, in the program you see in this post. In general, this technique can be useful when using a new Python library for the first time: just don't handle any exceptions in the beginning, use it a few times with variations in input or modes of use, and see what sorts of exceptions it throws. Then add code to handle them.

[1] The components of a URL

[2] Parts of URL

Another error case - a made-up site name that does not exist:

$ python is_site_online.py http://abcd.efg
Checking if these sites are online or not:
http://abcd.efg
------------------------------------------------------------
Caught ConnectionError: HTTPConnectionPool(host='abcd.efg',
port=80): Max retries exceeded with url: / (Caused
by NewConnectionError(': Failed
to establish a new connection: [Errno 11004] getaddrinfo
failed',))
From the above error we can see or figure out a few things:

- the requests library defines a ConnectionError exception. I first ran the above command without catching ConnectionError in the program; it gave that error, then I added the handler for it.

- requests uses an HTTP connection pool

- requests does some retries when you try to get() or head() a URL (a site name)

- requests uses urllib3 (from the Python standard library) under the hood

I had discovered that last point earlier too; see this post:

urllib3, the library used by the Python requests library

And as I mentioned in that post, urllib3 itself uses httplib.

Now let's check for some sites that are misspellings of the site google.com:

$ python is_site_online.py http://gogle.com
Checking ...
------------------------------------------------------------
Site: http://gogle.com
With allow_redirects: False
Results:
r.ok: True
r.status_code: 301
request time: 3.377
------------------------------------------------------------
Site: http://gogle.com
With allow_redirects: True
Results:
r.ok: True
r.status_code: 200
request time: 1.982
------------------------------------------------------------

$ python is_site_online.py http://gooogle.com Checking ... ------------------------------------------------------------ Site: http://gooogle.com With allow_redirects: False Results: r.ok: True r.status_code: 301 request time: 0.425 ------------------------------------------------------------ Site: http://gooogle.com With allow_redirects: True Results: r.ok: True r.status_code: 200 request time: 1.216 ------------------------------------------------------------

Interestingly, the results show that that both those misspellings of google.com exist as sites.

It is known that some people register domains that are similar in spelling to well-known / popular / famous domain names, maybe hoping to capture some of the traffic resulting from users mistyping the famous ones. Although I did not plan it that way, I realized, from the above two results for gogle.com and gooogle.com, that this tool can be used to detect the existence of such sites (if they are online when you check, of course).

Related links:

Wikipedia: List_of_HTTP_status_codes

This excerpt from the above Wikipedia page is interesting:

[ 302 Found This is an example of industry practice contradicting the standard. The HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was "Moved Temporarily"),[22] but popular browsers implemented 302 with the functionality of a 303 See Other. Therefore, HTTP/1.1 added status codes 303 and 307 to distinguish between the two behaviours.[23] However, some Web applications and frameworks use the 302 status code as if it were the 303.[24] ]

3xx Redirection

W3C: Status Codes

URL redirection

requests docs: redirection section

IBM Knowledge Center: HTTP Status codes and reason phrases

Enjoy.

- Vasudev Ram - Online Python training and consulting

Sell your digital products online at a low monthly rate with SendOwl

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a creator of online products? Get Convertkit:

Email marketing for professional bloggers


Monday, April 10, 2017

Porting the text pager from Python to D (DLang)

By Vasudev Ram

$ tp file.txt
$ dir /s | tp


A few weeks ago, I had written this post:

tp, a simple text pager in Python

Some days ago, I thought of porting the pager from Python to D, a.k.a. the D language, and did so. The Python version was called tp.py (tp for Text Pager). So the D version is called tp.d.

The Wikipedia article about D says:

[ Though it originated as a re-engineering of C++, D is a distinct language, having redesigned some core C++ features while also taking inspiration from other languages, notably Java, Python, Ruby, C#, and Eiffel. ]

Here is the code for tp.d.
/*
File: tp.d
Purpose: A simple text pager.
Version: 0.1
Platform: Windows-only.
May be adaptable for Unix using tty / termios calls.
Only the use of getch() may need to be changed.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Twitter: https://mobile.twitter.com/vasudevram
*/

import std.stdio;
import std.string;
import std.file;

extern (C) int getch();

void usage(string[] args) {
    stderr.writeln("Usage: ", args[0], " text_file");
    stderr.writeln("or");
    stderr.writeln("command_generating_text | ", args[0]);
}

void pager(File in_fil, int lines_per_page=12, char quit_key='q')
{
    assert (lines_per_page > 1);
    int line_count = 0;
    int c;
    foreach (line; in_fil.byLine()) {
        writeln(line);
        line_count++;
        if ( (line_count % lines_per_page) == 0) {
            stdout.flush();
            c = getch();
            if (c == 'q')
                break;
            else
                line_count = 0;
        }
    }
}

int main(string[] args) {

    int lines_per_page = 20;
    File in_fil;

    try {
        if (args.length > 2) {
            usage(args);
            return 1;
        }
        else {
            if (args.length == 2) {
                in_fil = File(args[1]);
            }
            else {
                in_fil = stdin;
            }
            pager(in_fil, lines_per_page, 'q');
        }
    } catch (FileException fe) {
        stderr.writeln("Caught FileException: msg = ", fe.msg);
    } catch (Exception e) {
        stderr.writeln("Caught Exception: msg = ", e.msg);
    }
    finally {
        in_fil.close();
    }
    return 0;
}
Compile it the usual way with:
dmd tp.d
which will create tp.exe.
It can be run in either of two ways (paging a file or stdin), similar to the Python version. Here are two runs, dogfooding it:
tp tp.d
or
type tp.d | tp
As in the case of the Python pager (when run on itself), the output is the same as the source program itself, so I will not show it.

So, now I can say that tp is less than less :) [1] [2]

[1] Like Less is more
[2] Less in two senses of the word: it has less features, and a shorter name.

After writing this D pager, I did a brief visual comparison of it with the Python version, and was interested to see that the overall structure of the code was roughly the same, in the Python and D versions. Opening the file (or using stdin), reading lines from it, closing the file, getting characters from the keyboard, all were roughly similar in behavior. My guess that this is at least partly because both Python and D share a C heritage.

In fact, even the D line:

foreach (line; in_fil.byLine()) {

can, if you squint a bit, be read as the Python line:

for lin in fil.byLine()

which, while not valid Python, could be so, if you had a byLine method on file objects :)

For those interested, there was a good discussion about D on HN recently, here:

The reference D compiler is now open source (dlang.org)

Despite the thread title, it is more about the D language features than the licensing aspects.

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers



Friday, March 31, 2017

A Python class like the Unix tee command


By Vasudev Ram


Tee image attribution

Hi readers,

A few days ago, while doing some work with Python and Unix (which I do a lot of), I got the idea of trying to implement something like the Unix tee command, but within Python code - i.e., not as a Python program but as a small Python class that Python programmers could use to get tee-like functionality in their code.

Today I wrote the class and a test program and tried it out. Here is the code, in file tee.py:
# tee.py
# Purpose: A Python class with a write() method which, when 
# used instead of print() or sys.stdout.write(), for writing 
# output, will cause output to go to both sys.stdout and 
# the filename passed to the class's constructor. The output 
# file is called the teefile in the below comments and code.

# The idea is to do something roughly like the Unix tee command, 
# but from within Python code, using this class in your program.

# The teefile will be overwritten if it exists.

# The class also has a writeln() method which is a convenience 
# method that adds a newline at the end of each string it writes, 
# so that the user does not have to.

# Python's string formatting language is supported (without any 
# effort needed in this class), since Python's strings support it, 
# not the print method.

# Author: Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
import sys
from error_exit import error_exit

class Tee(object):
    def __init__(self, tee_filename):
        try:
            self.tee_fil = open(tee_filename, "w")
        except IOError as ioe:
            error_exit("Caught IOError: {}".format(repr(ioe)))
        except Exception as e:
            error_exit("Caught Exception: {}".format(repr(e)))

    def write(self, s):
        sys.stdout.write(s)
        self.tee_fil.write(s)

    def writeln(self, s):
        self.write(s + '\n')

    def close(self):
        try:
            self.tee_fil.close()
        except IOError as ioe:
            error_exit("Caught IOError: {}".format(repr(ioe)))
        except Exception as e:
            error_exit("Caught Exception: {}".format(repr(e)))

def main():
    if len(sys.argv) != 2:
        error_exit("Usage: python {} teefile".format(sys.argv[0]))
    tee = Tee(sys.argv[1])
    tee.write("This is a test of the Tee Python class.\n")
    tee.writeln("It is inspired by the Unix tee command,")
    tee.write("which can send output to both a file and stdout.\n")
    i = 1
    s = "apple"
    tee.writeln("This line has interpolated values like {} and '{}'.".format(i, s))
    tee.close()

if __name__ == '__main__':
    main()
And when I ran it, I got this output:
$ python tee.py test_tee.out
This is a test of the Tee Python class.
It is inspired by the Unix tee command,
which can send output to both a file and stdout.
This line has interpolated values like 1 and 'apple'.

$ type test_tee.out
This is a test of the Tee Python class.
It is inspired by the Unix tee command,
which can send output to both a file and stdout.
This line has interpolated values like 1 and 'apple'.

$ python tee.py test_tee.out > main.out

$ fc /l main.out test_tee.out
Comparing files main.out and TEST_TEE.OUT
FC: no differences encountered
As you can see, I compared the teefile with the redirected stdout output, and they are the same.

I have not implemented the exact same features as the Unix tee. E.g. I did not implement the -a option (to append to a teefile if it exists, instead of overwriting it), and did not implement the option of multiple teefiles. Both are straightforward.

Ideas for the use of this Tee class and programs using it:

- the obvious one - use it like the Unix tee, to both make a copy of some program's output in a file, and show the same output on the screen. We could even pipe the screen (i.e. stdout) output to a Python (or other) text file pager :-)

- to capture intermediate output of some of the commands in a pipeline, before the later commands change it. For another way of doing that, see:

Using PipeController to run a pipe incrementally

- use it to make multiple copies of a file, by implementing the Unix tee command's multiple output file option in the Tee class.

Then we can even use it like this, so we don't get any screen output, and also copy some data to multiple files in a single step:

program_using_tee_class.py >/dev/null # or >NUL if on Windows.

Assuming that multiple teefiles were specified when creating the Tee object that the program will use, this will cause multiple copies of the program's output to be made in different specified teefiles, while the screen output will be thrown away. IOW, it will act like a command to copy some data (the output of the Python program) to multiple locations at the same time, e.g. one could be on a directory on your hard disk, another could be on a USB thumb/pen drive, a third could be on a network share, etc. The advantage here is that by copying from the source only once, to multiple destinations, we avoid reading or generating data multiple times, one for the copy to each destination. This can be more efficient, particularly for large outputs / copies.

For more fun Unixy / Pythonic stdin / stdout / pipe stuff, check out:

[xtopdf] PDFWriter can create PDF from standard input

and a follow-up post, that shows how to use the StdinToPDF program in that post, along with my selpg Unix C utility, to print only selected pages of text to PDF:

Print selected text pages to PDF with Python, selpg and xtopdf on Linux

Enjoy your tea :)

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers




Wednesday, March 1, 2017

Show error numbers and codes from the os.errno module

By Vasudev Ram

While browsing the Python standard library docs, in particular the module os.errno, I got the idea of writing this small utility to display os.errno error codes and error names, which are stored in the dict os.errno.errorcode:

Here is the program, os_errno_info.py:
from __future__ import print_function
'''
os_errno_info.py
To show the error codes and 
names from the os.errno module.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys
import os

def main():
    
    print("Showing error codes and names\nfrom the os.errno module:")
    print("Python sys.version:", sys.version[:6])
    print("Number of error codes:", len(os.errno.errorcode))
    print("{0:>4}{1:>8}   {2:<20}    {3:<}".format(\
        "Idx", "Code", "Name", "Message"))
    for idx, key in enumerate(sorted(os.errno.errorcode)):
        print("{0:>4}{1:>8}   {2:<20}    {3:<}".format(\
            idx, key, os.errno.errorcode[key], os.strerror(key)))

if __name__ == '__main__':
    main()
And here is the output on running it:
$ py -2 os_errno_info.py >out2 && gvim out2
Showing error codes and names
from the os.errno module:
Python sys.version: 2.7.12
Number of error codes: 86
 Idx    Code   Name                    Message
   0       1   EPERM                   Operation not permitted
   1       2   ENOENT                  No such file or directory
   2       3   ESRCH                   No such process
   3       4   EINTR                   Interrupted function call
   4       5   EIO                     Input/output error
   5       6   ENXIO                   No such device or address
   6       7   E2BIG                   Arg list too long
   7       8   ENOEXEC                 Exec format error
   8       9   EBADF                   Bad file descriptor
   9      10   ECHILD                  No child processes
  10      11   EAGAIN                  Resource temporarily unavailable
  11      12   ENOMEM                  Not enough space
  12      13   EACCES                  Permission denied
  13      14   EFAULT                  Bad address
  14      16   EBUSY                   Resource device
  15      17   EEXIST                  File exists
  16      18   EXDEV                   Improper link
  17      19   ENODEV                  No such device
  18      20   ENOTDIR                 Not a directory
  19      21   EISDIR                  Is a directory
  20      22   EINVAL                  Invalid argument
  21      23   ENFILE                  Too many open files in system
  22      24   EMFILE                  Too many open files
  23      25   ENOTTY                  Inappropriate I/O control operation
  24      27   EFBIG                   File too large
  25      28   ENOSPC                  No space left on device
  26      29   ESPIPE                  Invalid seek
  27      30   EROFS                   Read-only file system
  28      31   EMLINK                  Too many links
  29      32   EPIPE                   Broken pipe
  30      33   EDOM                    Domain error
  31      34   ERANGE                  Result too large
  32      36   EDEADLOCK               Resource deadlock avoided
  33      38   ENAMETOOLONG            Filename too long
  34      39   ENOLCK                  No locks available
  35      40   ENOSYS                  Function not implemented
  36      41   ENOTEMPTY               Directory not empty
  37      42   EILSEQ                  Illegal byte sequence
  38   10000   WSABASEERR              Unknown error
  39   10004   WSAEINTR                Unknown error
  40   10009   WSAEBADF                Unknown error
  41   10013   WSAEACCES               Unknown error
  42   10014   WSAEFAULT               Unknown error
  43   10022   WSAEINVAL               Unknown error
  44   10024   WSAEMFILE               Unknown error
  45   10035   WSAEWOULDBLOCK          Unknown error
  46   10036   WSAEINPROGRESS          Unknown error
  47   10037   WSAEALREADY             Unknown error
  48   10038   WSAENOTSOCK             Unknown error
  49   10039   WSAEDESTADDRREQ         Unknown error
  50   10040   WSAEMSGSIZE             Unknown error
  51   10041   WSAEPROTOTYPE           Unknown error
  52   10042   WSAENOPROTOOPT          Unknown error
  53   10043   WSAEPROTONOSUPPORT      Unknown error
  54   10044   WSAESOCKTNOSUPPORT      Unknown error
  55   10045   WSAEOPNOTSUPP           Unknown error
  56   10046   WSAEPFNOSUPPORT         Unknown error
  57   10047   WSAEAFNOSUPPORT         Unknown error
  58   10048   WSAEADDRINUSE           Unknown error
  59   10049   WSAEADDRNOTAVAIL        Unknown error
  60   10050   WSAENETDOWN             Unknown error
  61   10051   WSAENETUNREACH          Unknown error
  62   10052   WSAENETRESET            Unknown error
  63   10053   WSAECONNABORTED         Unknown error
  64   10054   WSAECONNRESET           Unknown error
  65   10055   WSAENOBUFS              Unknown error
  66   10056   WSAEISCONN              Unknown error
  67   10057   WSAENOTCONN             Unknown error
  68   10058   WSAESHUTDOWN            Unknown error
  69   10059   WSAETOOMANYREFS         Unknown error
  70   10060   WSAETIMEDOUT            Unknown error
  71   10061   WSAECONNREFUSED         Unknown error
  72   10062   WSAELOOP                Unknown error
  73   10063   WSAENAMETOOLONG         Unknown error
  74   10064   WSAEHOSTDOWN            Unknown error
  75   10065   WSAEHOSTUNREACH         Unknown error
  76   10066   WSAENOTEMPTY            Unknown error
  77   10067   WSAEPROCLIM             Unknown error
  78   10068   WSAEUSERS               Unknown error
  79   10069   WSAEDQUOT               Unknown error
  80   10070   WSAESTALE               Unknown error
  81   10071   WSAEREMOTE              Unknown error
  82   10091   WSASYSNOTREADY          Unknown error
  83   10092   WSAVERNOTSUPPORTED      Unknown error
  84   10093   WSANOTINITIALISED       Unknown error
  85   10101   WSAEDISCON              Unknown error

In the above Python command line, you can of course skip the "&& gvim out2" part. It is just there to automatically open the output file in gVim (text editor) after the utility runs.

The above output was from running it with Python 2.
The utility is written to also work with Python 3.
To change the command line to use Python 3, just change 2 to 3 everywhere in the above Python command :)
(You need to install or already have py, the Python Launcher for Windows, for the py command to work. If you don't have it, or are not on Windows, use python instead of py -2 or py -3 in the above python command line - after having set your OS PATH to point to Python 2 or Python 3 as wanted.)

The only differences in the output are the version message (2.x vs 3.x), and the number of error codes - 86 in Python 2 vs. 101 in Python 3.
Unix people will recognize many of the messages (EACCES, ENOENT, EBADF, etc.) as being familiar ones that you get while programming on Unix.
The error names starting with W are probably Windows-specific errors. Not sure how to get the messages for those, need to look it up. (It currently shows "Unknown error" for them.)

This above Python utility was inspired by an earlier auxiliary utility I wrote, called showsyserr.c, as part of my IBM developerWorks article, Developing a Linux command-line utility (not the main utility described in the article). Following (recursively) the link in the previous sentence will lead you to the code for both the auxiliary and the main utility, as well as the PDF version of the article.

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel



Saturday, February 11, 2017

tp, a simple text pager in Python

By Vasudev Ram

Yesterday I got this idea of writing a simple text file pager in Python.

Here it is, in file tp.py:
'''
tp.py
Purpose: A simple text pager.
Version: 0.1
Platform: Windows-only.
Can be adapted for Unix using tty / termios calls.
Only the use of msvcrt.getch() needs to be changed.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys
import string
from msvcrt import getch

def pager(in_fil=sys.stdin, lines_per_page=10, quit_key='q'):
    assert lines_per_page > 1 and lines_per_page == int(lines_per_page)
    assert len(quit_key) == 1 and \
        quit_key in (string.ascii_letters + string.digits)
    lin_ctr = 0
    for lin in in_fil:
        sys.stdout.write(lin)
        lin_ctr += 1
        if lin_ctr >= lines_per_page:
            c = getch().lower()
            if c == quit_key.lower():
                break
            else:
                lin_ctr = 0

def main():
    try:
        sa, lsa = sys.argv, len(sys.argv)
        if lsa == 1:
            pager()
        elif lsa == 2:
            with open(sa[1], "r") as in_fil:
                pager(in_fil)
        else:
            sys.stderr.write
            ("Only one input file allowed in this version")
                    
    except IOError as ioe:
        sys.stderr.write("Caught IOError: {}".format(repr(ioe)))
        sys.exit(1)

    except Exception as e:
        sys.stderr.write("Caught Exception: {}".format(repr(e)))
        sys.exit(1)

if __name__ == '__main__':
    main()
I added a couple of assertions for sanity checking.

The logic of the program is fairly straightforward:

- open (for reading) the filename given as command line argument, or just read (the already-open) sys.stdin
- loop over the lines of the file, lines-per-page lines at a time
- read a character from the keyboard (without waiting for Enter, hence the use of msvcrt.getch [1])
- if it is the quit key, quit, else reset line counter and print another batch of lines
- do error handling as needed

[1] The msvcrt module is on Windows only, but there are ways to get equivalent functionality on Unixen; google for phrases like "reading a keypress on Unix without waiting for Enter", and look up Unix terms like tty, termios, curses, cbreak, etc.

And here are two runs of the program that dogfood it, one directly with a file (the program itself) as a command-line argument, and the other with the program at the end of a pipeline; output is not shown since it is the same as the input file, in both cases; you just have to press some key (other than q (which makes it quit), repeatedly, to page through the content):
$ python tp.py tp.py
$type tp.py | python tp.py

I could have golfed the code a bit, but chose not to, in the interest of the Zen of Python. Heck, Python is already Zen enough.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel



Sunday, January 8, 2017

An Unix seq-like utility in Python

By Vasudev Ram


Due to a chain (or sequence - pun intended :) of thoughts, I got the idea of writing a simple version of the Unix seq utility (command-line) in Python. (Some Unix versions have a similar command called jot.)

Note: I wrote this program just for fun. As the seq Wikipedia page says, modern versions of bash can do the work of seq. But this program may still be useful on Windows - not sure if the CMD shell has seq-like functionality or not. PowerShell probably has it, is my guess.)

The seq command lets you specify one or two or three numbers as command-line arguments (some of which are optional): the start, stop and step values, and it outputs all numbers in that range and with that step between them (default step is 1). I have not tried to exactly emulate seq, instead I've written my own version. One difference is that mine does not support the step argument (so it can only be 1), at least in this version. That can be added later. Another is that I print the numbers with spaces in between them, not newlines. Another is that I don't support floating-point numbers in this version (again, can be added).

The seq command has more uses than the above description might suggest (in fact, it is mainly used for other things than just printing a sequence of numbers - after all, who would have a need to do that much). Here is one example, on Unix (from the Wikipedia article about seq):
# Remove file1 through file17:
for n in `seq 17`
do
    rm file$n
done
Note that those are backquotes or grave accents around seq 17 in the above code snippet. It uses sh / bash syntax, so requires one of them, or a compatible shell.

Here is the code for seq1.py:
'''
seq1.py
Purpose: To act somewhat like the Unix seq command.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys

def main():
    sa, lsa = sys.argv, len(sys.argv)
    if lsa < 2:
        sys.exit(1)
    try:
        start = 1
        if lsa == 2:
            end = int(sa[1])
        elif lsa == 3:
            start = int(sa[1])
            end = int(sa[2])
        else: # lsa > 3
            sys.exit(1)
    except ValueError as ve:
        sys.exit(1)

    for num in xrange(start, end + 1):
        print num, 
    sys.exit(0)
    
if __name__ == '__main__':
    main()
And here are a few runs of seq1.py, and the output of each run, below:
$ py -2 seq1.py

$ py -2 seq1.py 1
1

$ py -2 seq1.py 2
1 2

$ py -2 seq1.py 3
1 2 3

$ py -2 seq1.py 1 1
1

$ py -2 seq1.py 1 2
1 2

$ py -2 seq1.py 1 3
1 2 3

$ py -2 seq1.py 4
1 2 3 4

$ py -2 seq1.py 1 4
1 2 3 4

$ py -2 seq1.py 2 2
2

$ py -2 seq1.py 5 3

$ py -2 seq1.py -6 -2
-6 -5 -4 -3 -2

$ py -2 seq1.py -4 -0
-4 -3 -2 -1 0

$ py -2 seq1.py -5 5
-5 -4 -3 -2 -1 0 1 2 3 4 5

There are many other possible uses for seq, if one uses one's imagination, such as rapidly generating various filenames or directory names, with numbers in them (as a prefix, suffix or in the middle), for testing or other purposes, etc.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel



Friday, December 30, 2016

Jal-Tarang, and a musical alarm clock in Python

By Vasudev Ram

Hi readers,

Season's greetings!

After you check out this video (Ranjana Pradhan playing the Jal Tarang in Sydney, 2006, read the rest of the post below ...



Here is a program that acts as a musical command-line alarm clock. It is an adaptation of the one I created here a while ago:

A simple alarm clock in Python (command-line)

This one plays a musical sound (using the playsound Python library) when the alarm time is reached, instead of beeping like the above one does. I had recently come across and used the playsound library in a Python course I conducted, so I thought of enhancing the earlier alarm clock app to use playsound. Playsound is a nice simple Python module with just one function, also called playsound, which can play either WAV or MP3 audio files on Windows.

The playsound library is described as "a pure Python, cross platform, single function module with no dependencies for playing sounds."

Excerpt from its PyPI page:

[ On Windows, uses windll.winmm. WAVE and MP3 have been tested and are known to work. Other file formats may work as well.

On OS X, uses AppKit.NSSound. WAVE and MP3 have been tested and are known to work. In general, anything QuickTime can play, playsound should be able to play, for OS X.

On Linux, uses ossaudiodev. I don’t have a machine with Linux, so this hasn’t been tested at all. Theoretically, it plays WAVE files. ]

Here is the code for the program, musical_alarm_clock.py:
from __future__ import print_function

'''
musical_alarm_clock.py

Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram

Description: A simple program to make the computer act like 
a musical alarm clock. Start it running from the command line 
with a command line argument specifying the number of minutes 
after which to give the alarm. It will wait for that long, and 
then play a musical sound a few times.
'''

import sys
import string
from playsound import playsound, PlaysoundException
from time import sleep, asctime

sa = sys.argv
lsa = len(sys.argv)
if lsa != 2:
    print("Usage: python {} duration_in_minutes.".format(sys.argv[0]))
    print("Example: python {} 10".format(sys.argv[0]))
    print("Use a value of 0 minutes for testing the alarm immediately.")
    print("The program plays a musical sound a few times after the duration is over.")
    sys.exit(1)

try:
    minutes = int(sa[1])
except ValueError:
    print("Invalid value {} for minutes.".format(sa[1]))
    print("Should be an integer >= 0.")
    sys.exit(1)

if minutes < 0:
    print("Invalid value {} for minutes.".format(minutes))
    print("Should be an integer >= 0.")
    sys.exit(1)

seconds = minutes * 60

if minutes == 1:
    unit_word = "minute"
else:
    unit_word = "minutes"

try:
    print("Current time is {}.".format(asctime()))
    if minutes > 0:
        print("Alarm set for {} {} later.".format(str(minutes), unit_word))
        sleep(seconds)
    else:
        print("Running in immediate test mode, with no delay.")
    print("Alarm time reached at {}.".format(asctime()))
    print("Wake up.")
    for i in range(5):
        playsound(r'c:\windows\media\chimes.wav')        
        #sleep(1.00)
        sleep(0.50)
        #sleep(0.25)
        #sleep(0.10)
except PlaysoundException as pe:
    print("Error: PlaysoundException: message: {}".format(pe))
    sys.exit(1)
except KeyboardInterrupt:
    print("Interrupted by user.")
    sys.exit(1)



Jal Tarang image attribution

The picture above is of a Jal Tarang.

The Jal Tarang is an ancient Indian melodic percussion instrument. Brief description adapted from the Wikipedia article: It consists of a set of bowls filled with water. The bowls can be of different sizes and contain different amounts of water. The instrument is tuned by adjusting the amount of water in each bowl. The music is played by striking the bowls with two sticks.

I've watched live jal-tarang performances only a few times as a kid (it was probably somewhat uncommon even then, and there are very few people who play it nowadays), so it was interesting to see this video and read the article.

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel



Saturday, December 17, 2016

[xtopdf] Publish DSV data (Delimiter-Separated Values) to PDF

By Vasudev Ram



Here is another in my series of posts about xtopdf, my Python toolkit for PDF creation from other data/file formats. This one is for conversion of DSV data to PDF.

DSV stands for Delimiter-Separated Values, and is a superset or generalization of formats like CSV (Comma-Separated Values) and TSV (Tab-Separated Values). These formats are commonly used in general computing (business, scientific, etc.) and also in data science.

This program, DSVToPDF.py, builds upon the code in the read_dsv.py program in this recent post:

Processing DSV data (Delimiter-Separated Values) with Python

It reads DSV content either from filenames given as command-line arguments, or from standard input, and converts each such input into a PDF file with the same name as the DSV file, but with added extension ".pdf". In the case of standard input, since there is no DSV filename to use as the base, the PDF file is named dsv_output.pdf.

Here is the code for DSVToPDF.py:
from __future__ import print_function

"""
DSVToPDF.py
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Twitter: https://mobile.twitter.com/vasudevram
Purpose: Show how to publish DSV data (Delimiter-Separated Values) 
to PDF, using the xtopdf toolkit.
Requires:
 - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
 - xtopdf: https://bitbucket.org/vasudevram/xtopdf
First install ReportLab, then install xtopdf, using instructions here:
http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html
The DSV data can be read from either files or standard input.
The delimiter character is configurable by the user and can
be specified as either a character or its ASCII code.
References:
DSV format: https://en.wikipedia.org/wiki/Delimiter-separated_values 
TAOUP (The Art Of Unix Programming): Data File Metaformats:
http://www.catb.org/esr/writings/taoup/html/ch05s02.html
ASCII table: http://www.asciitable.com/
"""

import sys
import string
from PDFWriter import PDFWriter

def err_write(message):
    sys.stderr.write(message)

def error_exit(message):
    err_write(message)
    sys.exit(1)

def usage(argv, verbose=False):
    usage1 = \
        "{}: publish DSV (Delimiter-Separated-Values) data to PDF.\n".format(argv[0])
    usage2 = "Usage: python" + \
        " {} [ -c delim_char | -n delim_code ] [ dsv_file ] ...\n".format(argv[0])
    usage3 = [
        "where one of either the -c or -n option must be given,\n",  
        "delim_char is a single ASCII delimiter character, and\n", 
        "delim_code is a delimiter character's ASCII code.\n", 
        "Text lines will be read from specified DSV file(s) or\n", 
        "from standard input, split on the specified delimiter\n", 
        "specified by either the -c or -n option, processed, and\n", 
        "written, formatted, to PDF files with the name dsv_file.pdf.\n", 
    ]
    usage4 = "Use the -h or --help option for a more detailed help message.\n"
    err_write(usage1)
    err_write(usage2)
    if verbose:
        '''
        for line in usage3:
            err_write(line)
        '''
        err_write(''.join(usage3))
    if not verbose:
        err_write(usage4)

def str_to_int(s):
    try:
        return int(s)
    except ValueError as ve:
        error_exit(repr(ve))

def valid_delimiter(delim_code):
    return not invalid_delimiter(delim_code)

def invalid_delimiter(delim_code):
    # Non-ASCII codes not allowed, i.e. codes outside
    # the range 0 to 255.
    if delim_code < 0 or delim_code > 255:
        return True
    # Also, don't allow some specific ASCII codes;
    # add more, if it turns out they are needed.
    if delim_code in (10, 13):
        return True
    return False

def dsv_to_pdf(dsv_fil, delim_char, pdf_filename):
    with PDFWriter(pdf_filename) as pw:
        pw.setFont("Courier", 12)
        pw.setHeader(pdf_filename[:-4] + " => " + pdf_filename)
        pw.setFooter("Generated by xtopdf: https://google.com/search?q=xtopdf")
        for idx, lin in enumerate(dsv_fil):
            fields = lin.split(delim_char)
            assert len(fields) > 0
            # Knock off the newline at the end of the last field,
            # since it is the line terminator, not part of the field.
            if fields[-1][-1] == '\n':
                fields[-1] = fields[-1][:-1]
            # Treat a blank line as a line with one field,
            # an empty string (that is what split returns).
            pw.writeLine(' - '.join(fields))

def main():
    # Get and check validity of arguments.
    sa = sys.argv
    lsa = len(sa)
    if lsa == 1:
        usage(sa)
        sys.exit(0)
    elif lsa == 2:
        # Allow the help option with any letter case.
        if sa[1].lower() in ("-h", "--help"):
            usage(sa, verbose=True)
            sys.exit(0)
        else:
            usage(sa)
            sys.exit(0)

    # If we reach here, lsa is >= 3.
    # Check for valid mandatory options (sic).
    if not sa[1] in ("-c", "-n"):
        usage(sa, verbose=True)
        sys.exit(0)

    # If -c option given ...
    if sa[1] == "-c":
        # If next token is not a single character ...
        if len(sa[2]) != 1:
            error_exit(
            "{}: Error: -c option needs a single character after it.".format(sa[0]))
        if not sa[2] in string.printable:
            error_exit(
            "{}: Error: -c option needs a printable ASCII character after it.".format(\
            sa[0]))
        delim_char = sa[2]
    # else if -n option given ...
    elif sa[1] == "-n":
        delim_code = str_to_int(sa[2])
        if invalid_delimiter(delim_code):
            error_exit(
            "{}: Error: invalid delimiter code {} given for -n option.".format(\
            sa[0], delim_code))
        delim_char = chr(delim_code)
    else:
        # Checking for what should not happen ... a bit of defensive programming here.
        error_exit("{}: Program error: neither -c nor -n option given.".format(sa[0]))

    try:
        # If no filenames given, do sys.stdin to PDF ...
        if lsa == 3:
            print("Converting content of standard input to PDF.")
            dsv_fil = sys.stdin
            dsv_to_pdf(dsv_fil, delim_char, "dsv_output.pdf")
            dsv_fil.close()
            print("Output is in dsv_output.pdf")
        # else (filenames given), convert them to PDFs ...
        else:
            for dsv_filename in sa[3:]:
                pdf_filename = dsv_filename + ".pdf"
                print("Converting file {} to PDF.", dsv_filename)
                dsv_fil = open(dsv_filename, 'r')
                dsv_to_pdf(dsv_fil, delim_char, pdf_filename)
                dsv_fil.close()
                print("Output is in {}".format(pdf_filename))
    except IOError as ioe:
        error_exit("{}: Error: {}".format(sa[0], repr(ioe)))
        
if __name__ == '__main__':
    main()
Note the commented and uncommented lines in the "if verbose:" clause in the usage() function. The latter is shorter, and, methinks, a tad more Pythonic.
I did a sample run with the data shown in the image at the top of this post, which uses the pipe character (|) as the delimiter between fields (124 is the ASCII code for the pipe character):
$ python DSVToPDF.py -n 124 file4.dsv
And here is a cropped screenshot of the PDF output as viewed in Foxit PDF Reader:


- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes

Managed WordPress Hosting by FlyWheel