[sapug] Large dictionaries

Chris Foote chris at inetd.com.au
Thu May 11 15:36:51 CEST 2006


On Thu, 11 May 2006, Michael Cohen wrote:

> Hi Chris,
>  You probably need to balance the hash table. Normally a dict works by hashing
>  the key into an array of a certain size. The array is indexed by the hash
>  value and a linked list of values is attached to each slot in the array.
>
>  If you load the table too much, there will be many hash collisions, which
>  will cause the linked lists to get very long. This will slow down access time
>  in an exponential fasion.
>
>  Apart from not using a tuple for a dict key (im not sure how efficient the
>  algorithm of getting a hash value from a python tuple struct is). I would
>  recommend to split the single dict into a dict of dicts. This will have the
>  effect to spreading the load on the hash table:

Good thinking, but alas, the overhead of dictionary creation for the
second value was too much:

               dictionary  metakit sqlite[1] hash-in-hash
               ----------  ------- --------- ------------
1M numbers       8.8s       22.2s     89.6s        21.9s
2M numbers      24.0s       43.7s    190.0s        56.8s
5M numbers      115.3s     105.4s       N/A    > 185s[2]

[1] pysqlite V1 & sqlite V3.

[2] I had to kill the process because it chewed up all 1Gig of RAM :-(

Thanks for the suggestion, but no go.


-- 
Chris Foote <chris at inetd.com.au>
Inetd Pty Ltd T/A HostExpress
Web:   http://www.hostexpress.com.au
Blog:  http://www.hostexpress.com.au/drupal/chris
Phone: (08) 8410 4566


More information about the sapug mailing list