[Python-Dev] Dictionary sparseness
Tim Peters
tim.one@comcast.net
Sun, 11 May 2003 21:47:33 -0400
[Agthorr]
> An alternate optimization would be the additional of an immutable
> dictionary type to the language, initialized from a mutable dictionary
> type. Upon creation, this dictionary would optimize itself, in a
> manner similar to "gperf" program which creates (nearly) minimal
> zero-collision hash tables.
Possibly, but it's fraught with difficulties. For example, Python dicts can
be indexed by lots of things besides 8-bit strings, and you generally need
to know a great deal about the internal structure of a key type to generate
a sensible hash function. A more fundamental problem is that minimality can
be harmful when failing lookups are frequent: a sparse table has a good
chance of hitting a null entry immediately then, but a minimal table never
does. In the former case full-blown key comparison can be skipped when a
null entry is hit, in the latter case full-blown key comparison is always
needed on a failing lookup.
For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary
search trees a few years ago, and I think you'd enjoy reading their papers:
http://www.cs.princeton.edu/~rs/strings/
In particular, they're faster than hashing in the failing-lookup case.