[Python-Dev] Dictionary sparseness

Agthorr agthorr@barsoom.org
Sun, 11 May 2003 20:28:18 -0700


On Sun, May 11, 2003 at 09:47:33PM -0400, Tim Peters wrote:
> Possibly, but it's fraught with difficulties.  For example, Python dicts can
> be indexed by lots of things besides 8-bit strings, and you generally need
> to know a great deal about the internal structure of a key type to generate
> a sensible hash function.  

> A more fundamental problem is that minimality can be harmful when
> failing lookups are frequent: a sparse table has a good chance of
> hitting a null entry immediately then, but a minimal table never
> does.  In the former case full-blown key comparison can be skipped
> when a null entry is hit, in the latter case full-blown key
> comparison is always needed on a failing lookup.

Both good observations.

> For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary
> search trees a few years ago, and I think you'd enjoy reading their papers:
> 
>     http://www.cs.princeton.edu/~rs/strings/
> 
> In particular, they're faster than hashing in the failing-lookup case.

hhmmm.. yes, those are interesting.  Thanks :-) A few months ago I
implemented suffix trees for fun and practice.  Suffix trees are based
on tries, and I used a binary-tree for each node to keep track of its
children (which the papers point out is an equivalent way of doing
ternary trees).

(Suffix trees let you input a set of strings of total length n.  This
has a cost of O(n) time and O(n) memory.  Then, you can look to see if
a string of length m is a substring of any of the strings in the set
in O(m) time; this is impressive since the number and size of the set
of strings only matters for the setup operation; it has no effect on
the lookup speed whatsoever.)

Ternary search trees seem like a good approach for string-only
dictionaries.  These seem like an inelegant optimization that might
yield performance improvements for places where non-string keys are
syntactically disallowed anyway (such as the members of a class or
module).

-- Agthorr