Inferno is an open-source Python MapReduce library. It has (from the site):
[ A query language for large amounts of structured text (CSV, JSON, etc).
A continuous and scheduled MapReduce daemon with an HTTP interface that automatically launches MapReduce jobs to handle a constant stream of incoming data. ]
Overview of Inferno.
This overview page has a nice serial example: starting with a small set of test data, it shows how to query for a certain result, in SQL and then in AWK (both are easy one-liners), but then goes on to show how the achieve the same result using Inferno.
The interesting point is that the Inferno code is also small (a "rule" of ~10 lines, presumably stored in a config file) and a one-line command, but the difference from the SQL and AWK examples is that this runs a Disco MapReduce job to distribute the work across the nodes on a cluster. There is almost nothing in the Inferno code to indicate that this is a distributed computing MapReduce job.
Inferno uses Disco.
Disco is "a distributed computing framework based on the MapReduce paradigm. Disco is open-source; developed by Nokia Research Center to solve real problems in handling massive amounts of data."
Some users of Disco: (Chango, Nokia, Zemanta). Chango staff seem to be the developers of Disco.
- Vasudev Ram - Dancing Bison Enterprises