Tuesday, March 29, 2016

Python one-liner to compare two files (conditions apply)

By Vasudev Ram

In my previous post a couple of days back:

A basic file compare utility in Python

I said that it was possible to write a shorter version of this program, subject to certain limitations.

You can do it with a one-liner. The limitation is that the sum of the sizes of both files being compared should be less than your computer's free memory [1]. This because I read both files fully into memory [2] to compare them [3].

First, the input files, 3 of the same ones as in the previous post I linked to above.
$ type f0.txt
file 1

$ type f1.txt
file 1

$ type f2.txt
file 2
And here is the Python one-liner, run on a pair of identical files and then on a pair of differing files:
$  python -c "print open('f0.txt', 'rb').read() == open('f1.txt', 'rb').read()"
True

$  python -c "print open('f0.txt', 'rb').read() == open('f2.txt', 'rb').read()"
False
Voila!

Note that you have to use the file opening mode of 'rb' (for 'read' and 'binary'), not just 'r', because otherwise Python will do newline translation and your result can be wrong, at least on Windows. I actually had this happen and then corrected the script.

[1] The free memory is what is left after subtracting from your total RAM, the space taken by the OS, buffer cache, other running apps and their data, the Python interpreter, and your script. So if you have 4 GB RAM, and the sum of the megabytes used by those items is 2.4 MB, you have 1.6 MB free, so the total size of files you can compare, if they are of equal size, is two files of 0.8 MB each. [4]

[2] Perl people call this technique slurping a file, and use it a lot. when feasible.

[3] Of course, with this technique we lose the extra info that the original program (file_compare.py) gives us, such as why the input files differ (e.g. in size or content).

[4] Not being 100% precise here, because Python data structures have some overhead. See the sys.getsizeof() function.

If you like one-liners, here are some more, some by me, some by others:

UNIX one-liner to kill a hanging Firefox process
(This one has some interesting comments on Unix processes.)

Python one-liner to open a web site from the command line

A first Python one-liner

Multiple Python one-liners

Python one-liner to get the filename and line number of the caller of the current function

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

2 comments:

Ravoori said...

How about using the standard library filecmp module https://docs.python.org/2/library/filecmp.html#filecmp.cmp

Vasudev Ram said...

See the comments on my previous post, linked to at the top of this post.