Friday, April 19, 2013

sqlite3dbm, an SQLite-backed dbm module


By Vasudev Ram

Saw this today. It seems to be on the Github account of Yelp.com.

They created it as a tool to help with Hadoop work on Amazon EMR (Elastic Map Reduce).

sqlite3dbm provides a SQLite-backed dictionary conforming to the dbm interface, along with a shelve class that wraps the dict and provides serialization for it.

I tried it out, and it worked as advertised.

How to use sqlite3dbm:

Import the module, use its open() method to create an SQLite database, getting back a handle to it, let's call it "db", then use Python dict syntax on db to store data.

Then, either in the same or another program later, you can again fetch and/or modify that data, with dict syntax.

Interesting idea. dbm modules, which implement key-value stores, are less powerful than relational databases (SQL), and were probably developed earlier (think ISAM, etc.), so it looks a bit backwards to implement a dbm-type store on top of SQLite. But the sqlite3dbm project page gives at least some justification for that:

[ This module was born to provide random-access extra data for Hadoop jobs on Amazon’s Elastic Map Reduce (EMR) cluster. We used to use bsddb for this because of its dead-simple dict interface. Unfortunately, bsddb is deprecated for removal from the standard library and also has inter-version compatability problems that make it not work on EMR. sqlite3 is the obvious alternative for a persistent store, but its powerful SQL interface can be too complex when you just want a dict. Thus, sqlite3dbm was born to provide a simple dictionary API on top of the ubiquitous and easily available sqlite3.

This module requres no setup or configuration once installed. Its goal is a stupid-simple solution whenever a persistent dictionary is desired. ]


- Vasudev Ram - Dancing Bison Enterprises

No comments: