Large Dictionaries
Chris Foote
chris at foote.com.au
Wed May 17 09:28:20 EDT 2006
Klaas wrote:
>> 22.2s 20m25s[3]
>
> 20m to insert 1m keys? You are doing something wrong.
Hi Mike.
I've put together some simplified test code, but the bsddb
module gives 11m for 1M keys:
Number generator test for 1000000 number ranges
with a maximum of 3 wildcard digits.
Wed May 17 22:18:17 2006 dictionary population started
Wed May 17 22:18:26 2006 dictionary population stopped, duration 8.6s
Wed May 17 22:18:27 2006 StorageBerkeleyDB population started
Wed May 17 22:29:32 2006 StorageBerkeleyDB population stopped, duration 665.6s
Wed May 17 22:29:33 2006 StorageSQLite population started
Wed May 17 22:30:38 2006 StorageSQLite population stopped, duration 65.5s
test code is attached.
> With bdb's it is crucial to insert keys in bytestring-sorted order.
For the bsddb test, I'm using a plain string. (The module docs list a
string being the only datatype supported for both keys & values).
> Also, be sure to give it a decent amount of cache.
The bsddb.hashopen() factory seems to have a bug in this regard; if you
supply a cachesize argument, then it barfs:
...
File "bsddb-test.py", line 67, in runtest
db = bsddb.hashopen(None, flag='c', cachesize=8192)
File "/usr/lib/python2.4/bsddb/__init__.py", line 288, in hashopen
if cachesize is not None: d.set_cachesize(0, cachesize)
bsddb._db.DBInvalidArgError: (22, 'Invalid argument -- DB->set_cachesize: method not permitted when environment
specified')
I'll file a bug report on this if it isn't already fixed.
Cheers,
Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bsddb-test.py
Type: text/x-python
Size: 3387 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20060517/84f0df16/attachment.py>
More information about the Python-list
mailing list