Pages

Friday, March 31, 2017

A Python class like the Unix tee command


By Vasudev Ram


Tee image attribution

Hi readers,

A few days ago, while doing some work with Python and Unix (which I do a lot of), I got the idea of trying to implement something like the Unix tee command, but within Python code - i.e., not as a Python program but as a small Python class that Python programmers could use to get tee-like functionality in their code.

Today I wrote the class and a test program and tried it out. Here is the code, in file tee.py:
# tee.py
# Purpose: A Python class with a write() method which, when 
# used instead of print() or sys.stdout.write(), for writing 
# output, will cause output to go to both sys.stdout and 
# the filename passed to the class's constructor. The output 
# file is called the teefile in the below comments and code.

# The idea is to do something roughly like the Unix tee command, 
# but from within Python code, using this class in your program.

# The teefile will be overwritten if it exists.

# The class also has a writeln() method which is a convenience 
# method that adds a newline at the end of each string it writes, 
# so that the user does not have to.

# Python's string formatting language is supported (without any 
# effort needed in this class), since Python's strings support it, 
# not the print method.

# Author: Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
import sys
from error_exit import error_exit

class Tee(object):
    def __init__(self, tee_filename):
        try:
            self.tee_fil = open(tee_filename, "w")
        except IOError as ioe:
            error_exit("Caught IOError: {}".format(repr(ioe)))
        except Exception as e:
            error_exit("Caught Exception: {}".format(repr(e)))

    def write(self, s):
        sys.stdout.write(s)
        self.tee_fil.write(s)

    def writeln(self, s):
        self.write(s + '\n')

    def close(self):
        try:
            self.tee_fil.close()
        except IOError as ioe:
            error_exit("Caught IOError: {}".format(repr(ioe)))
        except Exception as e:
            error_exit("Caught Exception: {}".format(repr(e)))

def main():
    if len(sys.argv) != 2:
        error_exit("Usage: python {} teefile".format(sys.argv[0]))
    tee = Tee(sys.argv[1])
    tee.write("This is a test of the Tee Python class.\n")
    tee.writeln("It is inspired by the Unix tee command,")
    tee.write("which can send output to both a file and stdout.\n")
    i = 1
    s = "apple"
    tee.writeln("This line has interpolated values like {} and '{}'.".format(i, s))
    tee.close()

if __name__ == '__main__':
    main()
And when I ran it, I got this output:
$ python tee.py test_tee.out
This is a test of the Tee Python class.
It is inspired by the Unix tee command,
which can send output to both a file and stdout.
This line has interpolated values like 1 and 'apple'.

$ type test_tee.out
This is a test of the Tee Python class.
It is inspired by the Unix tee command,
which can send output to both a file and stdout.
This line has interpolated values like 1 and 'apple'.

$ python tee.py test_tee.out > main.out

$ fc /l main.out test_tee.out
Comparing files main.out and TEST_TEE.OUT
FC: no differences encountered
As you can see, I compared the teefile with the redirected stdout output, and they are the same.

I have not implemented the exact same features as the Unix tee. E.g. I did not implement the -a option (to append to a teefile if it exists, instead of overwriting it), and did not implement the option of multiple teefiles. Both are straightforward.

Ideas for the use of this Tee class and programs using it:

- the obvious one - use it like the Unix tee, to both make a copy of some program's output in a file, and show the same output on the screen. We could even pipe the screen (i.e. stdout) output to a Python (or other) text file pager :-)

- to capture intermediate output of some of the commands in a pipeline, before the later commands change it. For another way of doing that, see:

Using PipeController to run a pipe incrementally

- use it to make multiple copies of a file, by implementing the Unix tee command's multiple output file option in the Tee class.

Then we can even use it like this, so we don't get any screen output, and also copy some data to multiple files in a single step:

program_using_tee_class.py >/dev/null # or >NUL if on Windows.

Assuming that multiple teefiles were specified when creating the Tee object that the program will use, this will cause multiple copies of the program's output to be made in different specified teefiles, while the screen output will be thrown away. IOW, it will act like a command to copy some data (the output of the Python program) to multiple locations at the same time, e.g. one could be on a directory on your hard disk, another could be on a USB thumb/pen drive, a third could be on a network share, etc. The advantage here is that by copying from the source only once, to multiple destinations, we avoid reading or generating data multiple times, one for the copy to each destination. This can be more efficient, particularly for large outputs / copies.

For more fun Unixy / Pythonic stdin / stdout / pipe stuff, check out:

[xtopdf] PDFWriter can create PDF from standard input

and a follow-up post, that shows how to use the StdinToPDF program in that post, along with my selpg Unix C utility, to print only selected pages of text to PDF:

Print selected text pages to PDF with Python, selpg and xtopdf on Linux

Enjoy your tea :)

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers




6 comments:

  1. Why not just use logging? Does the same thing, but not limited to two outputs.

    ReplyDelete
  2. Looks like a nice idea, but does seems to have a real performance benefit, it won't​ be quicker then copy the file to the destination you need.

    ReplyDelete
  3. @Marc: Good point. I do know about logging, of course, in fact just recently recommended to a client that we use it in a multi-threaded Python project we are doing, because the logging output is thread-safe, whereas output of plain prints can be wrongly interleaved, at least if you output multiple values in a single print call (seems so, based on an experiment I did).

    This was just a fun experiment. Also, multiple outputs can be implemented in this Tee class as well, as I said in the post; we just have to pass more file objects as arguments to the constructor, and write to all of them in the write method. But it is not meant to replace the logging module by any means; logging has many other useful features.

    ReplyDelete
  4. @fruch: Thanks. By "does seems" I guess you meant "does not seem". I think you did not get my point, which was that via this method, we are only reading or generating the input once, but copying it to many places, so it should be faster - leaving aside things like the fact that running a native binary copy command (e.g. COPY or XCOPY in Windows or cp in Unix) may be a bit to somewhat faster than a Python program. But with that method you would have to read the input for each copy you do. So for n copies, my method works out to 1 read or generate plus n writes, but for the OS copy, it is n of the former and n of the latter. That is where the benefit comes from, as I said in the post. The only con may be that the native copy may be somewhat faster in throughput since those commands may be optimized, but the Python one may not be much slower, since file I/O in Python is done using compiled C code under the hood - a lot of the Python file handling functions will directly map to corresponding functions from C's stdio library, if I'm not mistaken. So file I/O in Python should run at near C speed. But as with any performance issue, the only right way is to measure and measure again.


    ReplyDelete
  5. Just pointing out that UNIX tee can write to more than one file at a time. Also you may want to have your class implement a ContextManager to ease closing of file handles

    ReplyDelete

  6. @Ravoori: Thanks for your comment. I did mention that ability of Unix tee in the post - see below the code. I'm not always in favor of context managers, though I implemented one for my xtopdf toolkit:

    [xtopdf] PDFWriter now has context manager support:

    https://jugad2.blogspot.in/2013/12/xtopdf-pdfwriter-now-has-context.html

    ReplyDelete

Please be on-topic and civil in your comments. Comments not following these guidelines will be deleted.