For Large Dictionaries Could One Use Separate Dictionaries Where Each Dictionary Covers an Interval of the Input Range?
Graham Fawcett
graham.fawcett at gmail.com
Tue May 16 15:35:38 EDT 2006
On 5/16/06, skip at pobox.com <skip at pobox.com> wrote:
>
> Graham> Looking up a key in a dictionary is done in constant-time,
> Graham> i.e. it doesn't matter how large the dictionary is.
>
> Doesn't that depend on how many keys hash to the same value? For small
> dictionaries keeping the max keys that hash to the same value small isn't a
> huge problem.
Hi Skip. On reflection, I guess I ought to have asked the OP what he
meant by "large". :-) Probably asked for details on his use-case as
well.
Sure, collisions are a reality. The introductory comments in
dictobject.c in the Python source [1] give good coverage of how these
issues are handled in CPython. And, they make for great bedtime
reading! :-)
Regardless, I can't imagine that using multiple dictionaries would
provide a sufficient speed-up for the vast majority of large
dictionary use-cases. There are still swapping issues, plus the
user-code overhead of managing the multiple dicts, plus the multiple
lookups. If the only improvement is in collision reduction (which is
questionable in the general case, since an unfortunate set of keys
could result in numerous collisions even in a smaller database), then
is the extra overhead really worth it?
Still, it's just my opinion. If someone wants to post code and prove
me wrong, I'll eat my hat and take my lickings. ;-)
Best,
Graham
[1] http://svn.python.org/view/python/trunk/Objects/dictobject.c
More information about the Python-list
mailing list