Large Dictionaries

Chris Foote chris at foote.com.au
Tue May 16 11:05:36 CEST 2006


Claudio Grondi wrote:
> Chris Foote wrote:
>> p.s. Disk-based DBs are out of the question because most
>> key lookups will result in a miss, and lookup time is
>> critical for this application.
>>
> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for 
> BerkeleyDB (\Python24\DLLs\_bsddb.pyd vers. 4.2.52) are included in the 
> standard Python 2.4 distribution.

However, please note that the Python bsddb module doesn't support
in-memory based databases - note the library documentation's[1] wording:

	"Files never intended to be preserved on disk may be created by 	 
passing None as the filename."

which closely mirrors the Sleepycat documentation[2]:

	"In-memory databases never intended to be preserved on disk may 		be 
created by setting the file parameter to NULL."

It does actually use a temporary file (in /var/tmp), for which 
performance for my purposes is unsatisfactory:

# keys   dictionary  metakit  bsddb  (all using psyco)
------   ----------  -------  -----
1M            8.8s     22.2s  20m25s[3]
2M           24.0s     43.7s  N/A
5M          115.3s    105.4s  N/A

Cheers,
Chris

[1] bsddb docs:
     http://www.python.org/doc/current/lib/module-bsddb.html

[2] Sleepycat BerkeleyDB C API:
     http://www.sleepycat.com/docs/api_c/db_open.html

[3] Wall clock time.  Storing the (long_integer, integer) key in string 
form "long_integer:integer" since bsddb doesn't support keys that aren't 
integers or strings.



More information about the Python-list mailing list