Sunday, October 5, 2014

Flattening an arbitrarily nested list in Python

By Vasudev Ram



A while ago, I had written this blog post:

Flatten a list of lists with a list comprehension

which talks about flattening a nested list in Python, but only for a list nested two levels deep. It was not my code - I just had seen it via Hacker News, wrote about it, and commented on an interesting property of the Python language, in the code involved. A few readers also commented on that language property.

Recently, something reminded me of my earlier post. I realized that the code in that post only worked for a list nested two levels deep. So I thought of trying to solve the problem of flattening an arbitrarily nested Python list.

Here is my attempt at the solution, in the file flatten_list2.py [1]:
# Program to flatten an arbitrarily nested list.
#flatten_list2.py
def nested_item(depth, value):
    if depth <= 1:
        return [value]
    else:
        return [nested_item(depth - 1, value)]

def nested_list(n):
    """Generate a nested list where the i'th item is at depth i."""
    lis = []
    for i in range(n):
        if i == 0:
            lis.append(i)
        else:
            lis.append(nested_item(i, i))
    return lis

def flatten(lis):
    """Given a list, possibly nested to any level, return it flattened."""
    new_lis = []
    for item in lis:
        if type(item) == type([]):
            new_lis.extend(flatten(item))
        else:
            new_lis.append(item)
    return new_lis

for n in range(0, 7):
    print n,
    lis = nested_list(n)
    print "original:", lis
    new_lis = flatten(lis)
    print "flattened:", new_lis
    print
The above program has a function called nested_list that, given an argument n, generates a list of n items, as follows: the 0th item is 0, and each succeeding item is a list nested to level i, the i'th number in the range 1 <= i < n. (See the image at the top of the post.)

The function nested_item does the work of building up the items of the outermost list, where each item (except the 0th) is a nested list. And the function flatten does the work of flattening the built-up list of nested lists.

After writing and testing my solution, I found that it works for values of n up to 998, but for n = 999 it terminates with the message "maximum recursion depth exceceded", which is to be expected, since the maximum recursion depth in Python is 1000 by default, though there is a function in the sys module to set it, called setrecursionlimit. There are two recursive functions in the program, the functions called nested_item and flatten. The flatten function causes the recursion error.

Here is the output of a sample run of my program to flatten lists:

$ flatten_list2.py
0 original: []
flattened: []

1 original: [0]
flattened: [0]

2 original: [0, [1]]
flattened: [0, 1]

3 original: [0, [1], [[2]]]
flattened: [0, 1, 2]

4 original: [0, [1], [[2]], [[[3]]]]
flattened: [0, 1, 2, 3]

5 original: [0, [1], [[2]], [[[3]]], [[[[4]]]]]
flattened: [0, 1, 2, 3, 4]

6 original: [0, [1], [[2]], [[[3]]], [[[[4]]]], [[[[[5]]]]]]
flattened: [0, 1, 2, 3, 4, 5]

orig: []
flat: []
orig: [0]
flat: [0]
orig: [0, 1]
flat: [0, 1]
orig: [0, 1, 2]
flat: [0, 1, 2]
orig: [0, 1, 2, 3]
flat: [0, 1, 2, 3]
orig: [0, 1, 2, 3, 4]
flat: [0, 1, 2, 3, 4]
I had also modified the program to try to flatten non-nested lists as a test (since it should work for those too, i.e. not change them), and that is the second section in the output above.

[1] 42 was the number of lines in my earlier version, flatten_list.py (not shown here), which I then refactored into flatten_list2.py, improving the code some in the process, and also bringing the number of lines down to 37 (not counting the second section of code, added later, that tests the flatten function on lists that are not nested).

- Vasudev Ram - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

9 comments:

Henry Harrison said...

In Python 3 I think you could this elegantly using "yield from".

from collections.abc import Iterable

def flatten(iter):
for item in iter:
if isinstance(item, Iterable):
yield from yield_nested(item)
else:
yield item

Anonymous said...

Of course in Python 3.3+ the natural way to do this would be:
>>> o = [0, [1], [[2]], [[[3]]], [[[[4]]]], [[[[[5]]]]]]
>>> def yielder(x):
... for y in x:
... if isinstance(y, list):
... yield from yielder(y)
... else:
... yield y
...
>>> list(yielder(o))
[0, 1, 2, 3, 4, 5]

Vasudev Ram said...

Thanks to all who commented. Interesting solutions.

Unknown said...

There are a lot of fast algorithm to funcional programming in fn.py

Read the flatten test case

Vasudev Ram said...

@Robson: Will take a look at fn.py, thanks.

Also noticed that many of the other solutions in the comments use iterables (so as to handle tuples, etc.) which is more general than just handling lists. Good point.

And using yield, i.e. generators, will use less memory (e.g. fetching items from the result list only on demand) if the list is large; OTOH, one can't re-traverse a generator or do random access of its items, which is possible with a list. Trade-offs ...

Vasudev Ram said...

Also, using isinstance() instead of type() means the solutions will work even if some of the iterables are subclasses of list or tuple or other built-in iterable type.

Vasudev Ram said...

After searching some more, I also found this StackOverflow post which has some solutions to the problem:

http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python/

Alex Martelli (the martellibot) is one of the top answerers of that question, and as usual, goes into a lot of detail about it.

There is also a comment on that SO post that links to this page, saying it has some more interesting solutions:

http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html

Anonymous said...

Will this work? (where X is the list)

`X`.replace('[','').replace(']','').split(',')

Vasudev Ram said...

"Will that work?" My answer to that is:

"Only one way to find out." TM.

i.e.: Try it. :-)

Point is, that's the only true test of whether something works or not - trying it. (Okay, a mathematical proof is another test, but it does not apply or is not practical in many cases.)

But I did get what your solution is doing - convert the list to a string representation (using the backquotes), remove all the square brackets, split the resulting string on comma as delimiter, which yields a list again, without any nesting. But the split method of strings yields a list of strings.

When I try it in the Python interpreter, I get:

>>>lis = [ 0, [1], [[2]], [[[3]]], [[[[4]]]] ]

>>>`lis`
'[0, [1], [[2]], [[[3]]], [[[[4]]]]]'

>>>new_lis = `lis`.replace('[', '').replace(']', '').split(',')

>>>new_lis
['0', ' 1', ' 2', ' 3', ' 4']
>>>

which is not the solution we want, because, though the list is now flattened, each original non-list item has now become a string, whether it was a string or not earlier.

Creative solution, though ...