Friday, May 8, 2015

tabtospaces, utility to change tabs to spaces in Python files

By Vasudev Ram




Near the end of a recent blog post:

asciiflow.com: Draw flowcharts online, in ASCII

, I showed how this small snippet of Python code can be used to make a Python program usable as a component in a Unix pipeline:
for lin in sys.stdin:
    sys.stdout.write(process(lin))


Today I saw Raymond Hettinger (@raymondh)'s tweet about the -t and -tt command line options of Python:
#python tip: In Python 2, the -tt option raises an error when you foolishly mix spaces and tabs. In Python 3, that is always an error.
That made me think of writing a simple Python 2 tool to change the tabs in a Python file to spaces. Yes, I know it can be easily done in Unix or Windix [1] with any of sed / awk / tr etc. That's not the point. So here is tabtospaces.py:
import sys
for lin in sys.stdin:
    sys.stdout.write(lin.replace("\t", "    "))
[ Note: this code converts each tab into 4 spaces. It can be parameterized by passing a command-line option that specifies the number of spaces, such as 4 or 8, and then replacing each tab with that many spaces. Also note that I have not tested the program on many sets of data, just one for now. ]

I created a simple Python file, test1.py, that has mixed tabs and spaces to use as input to tabtospaces.py. Then I ran the following commands:
$ py -tt test1.py
  File "test1.py", line 4
    print arg,
              ^
TabError: inconsistent use of tabs and spaces in indentation

$ py tabtospaces.py < test1.py > test2.py

$ py -tt test2.py
0 1 2 3 4 5 6 7 8 9
which shows that tabtospaces.py does convert the tabs to spaces.

And you can see from this diff that the original test1.py and the test2.py generated by tabtospaces.py, differ only in the use of tabs vs. spaces:
$ fc /l test1.py test2.py
Comparing files test1.py and TEST2.PY
***** test1.py
    for arg in args:
                print arg,

***** TEST2.PY
    for arg in args:
        print arg,

*****

[1] Windix is the latest upcoming Unix-compatible OS from M$, due Real Soon Now. You heard it here first - TM.

- Vasudev Ram - Online Python training and programming

Dancing Bison Enterprises

Signup to hear about new software products or info-products that I create.

Posts about Python  Posts about xtopdf

Contact Page

3 comments:

Daniel Pope said...

Don't use str.replace() for this; this does not reflect how tabs work. They don't expand to exactly n spaces, they expand to the next multiple of n spaces. So this will break working Python code by changing its indentation.

To correctly convert tabs to spaces, use str.expandtab(8) or the reindent.py tool from the Python Hg repo (https://hg.python.org/cpython/file/tip/Tools/scripts/reindent.py).

Vasudev Ram said...

Thanks for your comment.

You're right that using str.replace() may not work, if the user has used an inconsistent number of spaces in different parts of the source .py file. I was aware of the way tabs expand (to the next tab stop, such as 0, 8, 16, etc.), and so did think of that issue when first writing the code for tabtospaces.py, but to keep the code simple and the post short, decided not to handle it then, and to maybe consider adding it in a later post or just mentioning the possibility (as a limitation of the tool) in the current post. However, ended up not mentioning it.

BTW, it is str.expandtabs (at least in Python 2), not expandtab. I guess it was a typo.

Good to know about reindent - will check it out.

Vasudev Ram said...

Also, another reason why I didn't add support to the tool to handle an inconsistent number of spaces in different parts of the file, was because I thought that good programmers would not use that. That is, they would only use either a mix of 4 spaces and tabs, or 8 spaces and tabs, exclusively, if they used a mix at all.

And finally, the tool was not really meant as a production tool that had the goal of changing tabs to spaces. It was meant more as an example of the technique of using a Python script in a Unix pipeline (using the for loop over sys.stdin), and that's why I mentioned and linked to my earlier post on that topic (about asciiflow.com), from this post.