[sapug] Large dictionaries

Daryl Tester Daryl.Tester at iocane.com.au
Thu May 11 11:39:28 CEST 2006


Chris Foote wrote:

> I have the need to store a large (10M) number of keys in a hash table,
> based on a tuple of (long_integer, integer).  The standard python
> dictionary works well for small numbers of keys, but starts to perform
> badly for me inserting roughly 5M keys:

Is this a startup operation, or are you expecting to insert that
many records that often?  I was going to suggest using cPickle to
load a precomputed dictionary, but a quick test shows the performance
is probably worse.

I haven't looked at the modern dictionary code, but seem to recall
that 1.5's code would expand the internal hash table  and perform
rebalancing operations when the collision rate got too high (the
rebalancing being the heavy cost).  I thought there might be
parameter to {}.__init__() to predefine the hash table size,
but alas this seems to have put the kibosh on that:

<http://mail.python.org/pipermail/python-bugs-list/2002-June/012091.html>

I find writing new types for Python to be pretty straightforward;
you could always find the (pre-) hash library of your dreams and
bolt that to a new type (or class).

> p.s. Disk-based DBs are out of the question because most key lookups
> will result in a miss, and lookup time is critical for this application.

A python interface to cdb perhaps?  <http://pilcrow.madison.wi.us/#pycdb>

Given enough memory you could probably cache the entire file in RAM.

Cheers.

-- 
Regards,
  Daryl Tester, IOCANE Pty. Ltd.


More information about the sapug mailing list