Python programs to count the frequencies of words in a string or from a file are used as common examples. They are often done using dicts. Here is a small program that counts the frequencies of lines in its input. There are some uses for this functionality. I will show those, and also compare and contrast this program with other tools, later.
The program uses an OrderedDict from the collections module of the Python standard library.
The program could also be written using either a regular dict or a defaultdict (also from the collections module), or a collections.Counter, with slightly different code in each of those cases.
from __future__ import print_function """ linefreq.py A program to find the frequencies of input lines. Author: Vasudev Ram Copyright 2016 Vasudev Ram Web site: https://vasudevram.github.io Blog: http://jugad2.blogspot.com Product store: http://gumroad.com/vasudevram """ import sys from collections import OrderedDict def linefreq(in_fil): counts = OrderedDict() for line in in_fil: counts[line] = counts.get(line, 0) + 1 print("Freq".rjust(8) + ": Line") for line, freq in counts.items(): print(str(freq).rjust(8) + ": " + line, end="") print('-' * (10 + max(map(len, counts)))) for line, freq in reversed(counts.items()): print(str(freq).rjust(8) + ": " + line, end="") def main(): sa, lsa = sys.argv, len(sys.argv) if lsa == 1: linefreq(sys.stdin) elif lsa == 2: with open(sa[1], "r") as in_fil: linefreq(in_fil) else: print("Only one filename argument supported.") if __name__ == '__main__': main()I ran it on this input file:
line 1 line 2 line 2 line 3 line 3 line 3 line 4 line 4 line 4 line 4where "line 1" occurs once, "line 2" occurs twice, etc., with this command:
$ python linefreq.py infile1.txtand got this output:
Freq: Line 1: line 1 2: line 2 3: line 3 4: line 4 ----------------- 4: line 4 3: line 3 2: line 2 1: line 1The reversed lines are output just to show that it is possible to use reversed() on an OrderedDict, unlike on a dict.
I also got the same output, as expected, when I ran this form of the command:
$ cat infile1.txt | python linefreq.pyThis line:
print('-' * (10 + max(map(len, counts))))is used to print a row of dashes as long as the longest output line from above it.
The length of the longest line can also be computed inline in the first for loop.
- Vasudev Ram - Online Python training and consulting
Get updates on my software products / ebooks / courses.
My Python posts Subscribe to my blog by email My ActiveState recipes
No comments:
Post a Comment