Tuesday, August 30, 2016

file_sizes utility in D: print sizes of all files under a directory tree

By Vasudev Ram



Manila folder image attribution

Here is a command-line utility written in D (Dlang), that finds and prints the names and sizes of all regular (i.e. non-hidden) files under a directory subtree, with the total at the end. It is called file_sizes.d. It can be compiled with:
$ dmd file_sizes.d
and run with:
$ file_sizes dirName
Here is the code for file_sizes.d:
/*********************************************************************
File: file_sizes.d
---------------------------------------------
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
---------------------------------------------
Purpose: To find the sizes of all files (recursively, including in 
subdirectories) under a given directory tree.
Compile with:

$ dmd file_sizes.d

Run with:

$ file_sizes dirName

Description: To find the sizes of all files under a given directory
tree. The program will print both the name of the file and the file size
in bytes, separated by a tab character, one file per line. At the end,
it will also print the total number of files, and sum of their sizes.

*********************************************************************/

import std.stdio;
import std.file;
import std.uni;

void usage(string[] args) {
    stderr.writeln("Usage: ", args[0], " dirName");
    stderr.writeln(
        "Recursively find and print names and " ~
        "sizes of all files under dirName.");
}

int main(string[] args) {
    try {
        if (args.length != 2) {
            usage(args);
            return 1;
        }
        string dirName = args[1];
        // Check that dirName is not NUL or CON (DOS device names).
        if (dirName.toUpper() == "NUL" || dirName.toUpper() == "CON" ) {
            stderr.writeln("Error: ", dirName, " is not a directory. Exiting.");
            return 1;
        }
        if (!exists(dirName)) {
            stderr.writeln("Error: ", dirName, " not found. Exiting.");
            return 1;
        }
        // Check if dirName is actually a directory.
        if (!DirEntry(dirName).isDir()) {
            stderr.writeln("Error: ", dirName, " is not a directory. Exiting.");
            return 1;
        }
        ulong file_count = 0;
        ulong total_size = 0;
        ulong size;
        foreach(DirEntry de; dirEntries(dirName, SpanMode.breadth)) {
            // The isFile() check may be enough, also need to check for
            // Windows vs. POSIX behavior.
            if (de.isFile() && !de.isDir()) {
                file_count += 1;
                size = getSize(de.name());
                total_size += size;
                writeln(de.name(), "\t", size);
            }
        }
        writeln("Directory: ", args[1], "\tFiles: ", file_count, 
            " Size: ", total_size);

    } catch (FileException fe) {
        stderr.writeln("Got a FileException: ", fe.toString(), 
        "\n. Errno: ", fe.errno, ". Exiting.");
        return 1;
    } catch (Exception e) {
        stderr.writeln("Got an Exception: ", e.toString(), 
        "\n. Exiting.");
        return 1;
    }
    return 0;
}
Here are two example runs of file_sizes:
$ file_sizes test_dir
test_dir\a1 380
test_dir\a2 1215
test_dir\dir1\b1    10894
test_dir\dir1\b2    3871
Directory: test_dir Files: 4 Size: 16360

$ file_sizes d:\temp | grep Directory
Directory: d:\temp      Files: 2232 Size: 275511672
In the 2nd run above, I filter out all the filenames and sizes using grep, so it only shows the summary/total line, which can be convenient when that is all you are interested in. Also, in the detail lines (the file names and sizes), the name and size are separated by a tab character, so that the output is compatible with Unix filters like sed and awk.

I tested that by piping a few runs of the program to awk running an awk script (to calculate the total by summing up column 2 (the sizes) - before I added code for the total line to the D program itself.). And you can still use the output with piping to awk, etc., to do any further processing on only the detail lines, by first piping the output to "grep -v Directory:" (which will work unless there is a file or path called that in the output).

file_sizes runs fairly fast on my machine.

The image at the top of the post is of a stack of Manila paper, from which Manila folders are made. Physical file folders were the inspiration for folders in computer file systems (a.k.a. directories).

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

My Python posts     Subscribe to my blog by email

My ActiveState recipes

Flywheel - Managed WordPress Hosting



1 comment:

Vasudev Ram said...

Obviously, this same task can be done in some other ways, such as:

Do a "dir /s" and then process its output, but then you have to take care to filter out the non-file-info lines.

Use a utility like Unix's du (but that is sort of what this one, file_sizes, is, though with less features than du).

Use a GUI tool or just Windows Explorer.

As usual, the advantage of writing your own tool is that you can customize it the way you want, you can make it a functiin that can be called from your other programs, etc. ...