Questions about bsddb

Nick Vatamaniuc vatamane at gmail.com
Wed May 9 20:07:46 EDT 2007


On May 9, 4:01 pm, sinoo... at yahoo.com wrote:
> Thanks for the info Nick. I plan on accessing the data in pretty much
> random order, and once the database is built, it will be read only.
> At this point Im not too concerned about access times, just getting
> something to work. I've been messing around with both bt and hash with
> limited success, which led me to think that maybe I was going beyond
> some internal limit for the data size.It works great  on a limited set
> of data, but once I turn it loose on the full set, usually several
> hours later,  it either causes a hard reset of my machine or the HD
> grinds on endlessly with no apparent progress.  Is there a limit to
> the size of data you can place per key?
>
> Thanks for the MySQL suggestion, I'll take a look.
>
> -JM

JM,

If you want, take a look at my PyDBTable on www.psipy.com.

The description and the examples section is being finished but the
source API documentation will help you.

It is a fast Python wrapper around MySQL, PostgreSQL or SQLite. It is
very fast and buffers queries and insertions. You just set up the
database and then pass the connection parameters to the initializer
method. After that you can use the pydb object as a dictionary of
{ primary_key : list_of_values }. You can even create indices on
individual fields and query with queries like :
---------------------------------------------------------------------------------------------------------
pydb.query( ['id','data_field1'], ('id','<',10),
('data_field1','LIKE','Hello%') )
--------------------------------------------------------------------------------------------------------

Which will translate into the SQL query like :

----------------------------------------------------------------------------------------------------------------------
SELECT id, data_field1 FROM ... WHERE id<10 AND data_field1 LIKE 'Hello
%'
----------------------------------------------------------------------------------------------------------------------

and return an __iterator__.

The iterator as a the result is excellent because you can iterate over
results much larger than your virtual memory.  But in the background
PyDBTable will retrieve rows from the database in large batches and
cache them as to optimise I/O.

Anyway, on my machine PyDBTable saturates the disk I/O (it runs as
fast as a pure MySQL query).

Take care,
-Nick Vatamaniuc




More information about the Python-list mailing list