Is 100,000 entries big for a dictionary?

Tim Peters tim.one at home.com
Sun Dec 31 16:43:22 EST 2000


[posted & mailed]

[rturpin at my-deja.com]
> I'm in the process of architecting an application with medium
> sized data sets. Python dictionaries are tempting as part of
> the representation mechanism. How do they perform when they
> have 100,000 entries?

Provided you have sufficient RAM, the theoretical performance (for lookup,
insert and delete) remains O(1) expected case, O(N) worst case, regardless
of dict size.  This is also what I've seen in practice:  Python ensures that
at least 1/3 of the internal hash table entries are unused (it uses an
extreme form of open addressing rather than buckets of linked lists), so
collisions are very rarely a problem.  It's possible to contrive keys that
will cause collisions systematically, but not easy -- unlikely to happen by
accident in 2.0.

If you don't have enough RAM, and collisions are frequent, performance can
be atrocious.  Curiously, no way is provided for you to find out whether you
*are* suffering frequent collisions!  If you suspect you are, (a) you need
to instrument dictobject.c; and, (b) you're probably wrong <wink>.

if-you-do-have-frequent-collisions-i'll-consider-it-a-bug-ly y'rs  - tim





More information about the Python-list mailing list