small, fast and cross-platform flat-file database for python

Brian Kelley bkelley at wi.mit.edu
Mon Nov 24 15:36:02 EST 2003


Tim Churches wrote:

> Brian Kelley wrote:
> 
>>I have successfully used metakit in a fairly large application and 
>>couldn't be happier with the results.  The best thing about metakit is
> 
> 
>>that you can view millions of records in a nice view with ease:
> 
> 
> Really millions of records? The metakit documentation warns that its
> efficiency starts to decline dramatically above about 250,000 records,
> and in testing I found that was true, and at 1 million records it became
> really slow.
> 

It depends upon what types of operations you are performing of course. 
   I have stored 1.5 million molecules in a metakit database taking up 
about 1 gigabyte.  Each compound has a variable number of data fields. 
Each compound has a unique id which helps *a lot*.  One of metakit's 
flaws, though, is that it doesn't really know if a given id is unique 
when it is doing internal selects and joins.  If the table size is "to 
large" I always do joins on the unique keys in pure python first.  That 
is, I use a python dictionary to pre-filter the necessary rows and then 
recombine them.

If you are doing a more complicated select operation and joining to an 
additional table, then things may slow down quite a bit.  In the mysql 
layer, this can happen during calls to view.product which is usually 
used for selections like:

I) select a,b from view1, view2 where view1.a > view2.b and view1.c < 
view2.d

My applications generally do selections like:

II) select a,b from view1, view2 where view1.id = view2.id and view1.a > 
20 and view2.b > 100.0

Which is very fast since I don't have the join explosion.

What metakit is really good for is very fast scans through columns.  If 
you can organize your logic such that columns take precedence over rows 
(i.e. II versus I) then metakit is very powerful even for many, many, 
records.  Another trick is to occasionally save the database to a new 
file, this optimizes internal space and the like:

storage.write(open(newfile, 'wb'))

At least, this has been my experience ;)

If your queries are more like I, I would suggest PySQLLite
http://pysqlite.sourceforge.net/



although

> Tim C

Brian.






More information about the Python-list mailing list