[Python-Dev] Dictionary sparseness

Tim Peters tim.one@comcast.net
Sun, 11 May 2003 21:47:33 -0400


[Agthorr]
> An alternate optimization would be the additional of an immutable
> dictionary type to the language, initialized from a mutable dictionary
> type.  Upon creation, this dictionary would optimize itself, in a
> manner similar to "gperf" program which creates (nearly) minimal
> zero-collision hash tables.

Possibly, but it's fraught with difficulties.  For example, Python dicts can
be indexed by lots of things besides 8-bit strings, and you generally need
to know a great deal about the internal structure of a key type to generate
a sensible hash function.  A more fundamental problem is that minimality can
be harmful when failing lookups are frequent:  a sparse table has a good
chance of hitting a null entry immediately then, but a minimal table never
does.  In the former case full-blown key comparison can be skipped when a
null entry is hit, in the latter case full-blown key comparison is always
needed on a failing lookup.

For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary
search trees a few years ago, and I think you'd enjoy reading their papers:

    http://www.cs.princeton.edu/~rs/strings/

In particular, they're faster than hashing in the failing-lookup case.