Tuesday, December 30, 2014

pyDAL, a pure Python Database Abstraction Layer

By Vasudev Ram


pyDAL is a pure Python Database Abstraction Layer. So it seems to be something like the lower layer of SQLAlchemy, i.e. SQLAlchemy Core, the library that is used by the upper layer, SQLAlchemy ORM. See the SQLAlchemy (0.8) documentation.

From the pyDAL site:

[ It dynamically generates the SQL in real time using the specified dialect for the database back end, so that you do not have to write SQL code or learn different SQL dialects (the term SQL is used generically), and your code will be portable among different types of databases.

pyDAL comes from the original web2py's DAL, with the aim of being wide-compatible. pyDAL doesn't require web2py and can be used in any Python context. ]

IOW, pyDAL has been separated out into a different project from web2py, a Python web framework, of which it was originally a part.

The use of an ORM (Object Relational Mapper) vs. writing plain SQL code (vs. using an intermediate option like pyDAL or SQLAlchemy Core), can be controversial; there are at least some pros and cons on both (or all 3) sides. I've read some about this, and have got some experience with using some of these options in different projects, but am not an expert on which is the best approach, and also, it can vary depending on your project's needs, so I'm not getting into that topic in this post.

pyDAL seems to support many popular databases, mostly SQL ones, but also a NoSQL one or two, and even IMAP. Here is a list, from the site: SQLite, PostgreSQL, MySQL, Oracle, MSSQL, FireBird, DB2, Informix, Ingres, Cubrid, Sybase, Teradata, SAPDB, MongoDB, IMAP.

For some of those databases, it uses PyMySQL, pyodbc or fbd, which are all Python database libraries that I had blogged about earlier.

I tried out pyDAL a little, with this simple program, adapted from its documentation:

import sys
import time
from pydal import DAL, Field
db = DAL('sqlite://storage.db')
db.define_table('product', Field('name'))
t1 = time.time()
num_rows = int(sys.argv[1])
for product_number in range(num_rows):
    db.product.insert(name='Product-'.format(str(product_number).zfill(4)))
t2 = time.time()
print "time to insert {} rows = {} seconds".format(num_rows, int(t2 - t1))
query = db.product.name
t1 = time.time()
rows = db(query).select()
for idx, row in enumerate(rows):
    #print idx, row.name
    pass
t2 = time.time()
print "time to select {} rows = {} seconds".format(num_rows, int(t2 - t1))

It worked, and gave this output:

$ python test_pydal2.py 100000
No handlers could be found for logger "web2py"
time to insert 100000 rows = 18 seconds
time to select 100000 rows = 7 seconds

Note: I first ran it with this statement uncommented:
#print idx, row.name
to confirm that it did select the records, and then commented it and replaced it with "pass" in order to time the select without the overhead of displaying the records to the screen.

I'll check out pyDAL some more, for other commonly needed database operations, and may write about it here.
There may be a way to disable that message about a logger.

The timing statements in the code and the time output can be ignored for now, since they are not meaningful without doing a comparison against the same operations done without pyDAL (i.e. just using SQL from Python with the DB API). I will do a comparison later on and blog about it if anything interesting is found.

- Vasudev Ram - Dancing Bison Enterprises - Python training and consulting

Signup to hear about new products or services from me.

Contact Page

1 comment:

Vasudev Ram said...

Oops, made a mistake in one line in the code above:

The line:

db.product.insert(name='Product-'.format(str(product_number).zfill(4))

should be:

db.product.insert(name='Product-{}'.format(str(product_number).zfill(4))

i.e. there should be a {} after 'Product-' but within the same string . Otherwise the Product- names won't end with the serial number.