Large Dictionaries

Claudio Grondi claudio.grondi at freenet.de
Tue May 16 14:09:10 CEST 2006


Chris Foote wrote:
> Claudio Grondi wrote:
> 
>> Chris Foote wrote:
>>
>>> p.s. Disk-based DBs are out of the question because most
>>> key lookups will result in a miss, and lookup time is
>>> critical for this application.
>>>
>> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for 
>> BerkeleyDB (\Python24\DLLs\_bsddb.pyd vers. 4.2.52) are included in 
>> the standard Python 2.4 distribution.
> 
> 
> However, please note that the Python bsddb module doesn't support
> in-memory based databases - note the library documentation's[1] wording:
> 
>     "Files never intended to be preserved on disk may be created by      
> passing None as the filename."
> 
> which closely mirrors the Sleepycat documentation[2]:
> 
>     "In-memory databases never intended to be preserved on disk 
> may         be created by setting the file parameter to NULL."
> 
> It does actually use a temporary file (in /var/tmp), for which 
> performance for my purposes is unsatisfactory:
> 
> # keys   dictionary  metakit  bsddb  (all using psyco)
> ------   ----------  -------  -----
> 1M            8.8s     22.2s  20m25s[3]
> 2M           24.0s     43.7s  N/A
> 5M          115.3s    105.4s  N/A
> 
> Cheers,
> Chris
> 
> [1] bsddb docs:
>     http://www.python.org/doc/current/lib/module-bsddb.html
> 
> [2] Sleepycat BerkeleyDB C API:
>     http://www.sleepycat.com/docs/api_c/db_open.html
> 
> [3] Wall clock time.  Storing the (long_integer, integer) key in string 
> form "long_integer:integer" since bsddb doesn't support keys that aren't 
> integers or strings.
I have to admit, that I haven't wrote any own code to actually test 
this, but if 20m25s for storing of a single MByte of strings in a 
database table index column is really what you are getting, I can't get 
rid of the feeling, that there is something elementary wrong with your 
way doing it.

Posting the code for your test cases appears to me to be the only option 
to see what is the reason for the mystery you are getting here (this 
will clarify also the other mysterious things considered by the posters 
to this thread up to now).

Claudio



More information about the Python-list mailing list