Friday, March 8, 2013

Python TableFu aims to become an ORM for spreadsheets

By Vasudev Ram

Python TableFu is an interesting tool I saw recently. It is a Python library that lets you import tabular data from CSV files and then manipulate them in memory, by calling its methods.

From the site:

[ Python TableFu is a tool for manipulating spreadsheet-like tables in Python. It began as a Python implementation of ProPublica's TableFu, though new methods have been added. TableFu allows filtering, faceting and manipulating of data. Going forward, the project aims to create something akin to an ORM for spreadsheets. ]

A CSV file you specify, is used as input to create a TableFu object, and the rows are available as a list, which can be indexed to get a specific row. The columns (of each row) are available as a dictionary, so you can say table[column_name], e.g. table["Author] to get a specific column. The model seems to make sense, since table rows usually don't have specific names, they are treated as "row number so-and-so", while columns do have names - the header of the column, if it is a CSV file, or the column name of a database table, using RDBMS terminology.

Python TableFu allows filtering, "faceting" and manipulation of the table data.

Here is some example Python TableFu code, from the site; I've deleted some bits to keep it short:

>>> from table_fu import TableFu
>>> table = TableFu.from_file('tests/test.csv')
>>> table.columns
['Author', 'Best Book', 'Number of Pages', 'Style']

# get all authors
>>> table.values('Author')
['Samuel Beckett', 'James Joyce', 'Nicholson Baker', 'Vladimir Sorokin']

# total a column
>>> table.total('Number of Pages')
1177.0

# filtering a table returns a new instance
>>> t2 = table.filter(Style='Modernism')
>>> list(t2)
[Row: Samuel Beckett, Malone Muert, 120, Modernism,
 Row: James Joyce, Ulysses, 644, Modernism]


# each TableFu instance acts like a list of rows
>>> table[0]


list(table.rows)
[Row: Samuel Beckett, Malone Muert, 120, Modernism,
 Row: James Joyce, Ulysses, 644, Modernism,
 Row: Nicholson Baker, Mezannine, 150, Minimalism,
 Row: Vladimir Sorokin, The Queue, 263, Satire]

# rows, in turn, act like dictionaries
>>> row = table[1]
>>> print row['Author']
James Joyce

# transpose a table
>>> t2 = table.transpose()
>>> list(t2)
[Row: Best Book, Malone Muert, Ulysses, Mezannine, The Queue,
 Row: Number of Pages, 120, 644, 150, 263,
 Row: Style, Modernism, Modernism, Minimalism, Satire]

>>> t2.columns
['Author',
 'Samuel Beckett',
 'James Joyce',
 'Nicholson Baker',
 'Vladimir Sorokin']

# sort rows
>>> table.sort('Author')
>>> table.rows
[Row: James Joyce, Ulysses, 644, Modernism,
 Row: Nicholson Baker, Mezannine, 150, Minimalism,
 Row: Samuel Beckett, Malone Muert, 120, Modernism,
 Row: Vladimir Sorokin, The Queue, 263, Satire]

(I deleted the angle brackets from the original output (that demarcates Rows) because it was messing up the HTML formatting, due to being interpreted as HTML elements.)

Python TableFu looks like it could be useful to manipulate CSV data in memory, before writing it out to another file or sending it to another process (or function in the same program) for further processing.

- Vasudev Ram - Dancing Bison Enterprises



No comments: