[Python-Dev] Dictionary sparseness

Walter Dörwald walter@livinglogic.de
Mon, 12 May 2003 11:56:25 +0200


Tim Peters wrote:

> [...]
> For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary
> search trees a few years ago, and I think you'd enjoy reading their papers:
> 
>     http://www.cs.princeton.edu/~rs/strings/
> 
> In particular, they're faster than hashing in the failing-lookup case.

The digital search tries mentioned in the article seem to use the
same fundamental approach as state machines, i.e. while traversing
the string, remember the string prefix that has already been
recognized. Digital search tries traverse the tree and the
memory is in the path that has been traversed. State machines
traverse a transition table and the memory is the current state.
Digital search tries seem to be easy to update, while state machine
are not.

Has anybody tried state machines for symbol tables in Python? The
size of the transition table might be a problem and any attempt
to reduce the size might kill performance in the inner loop.
Performancewise stringobject.c/string_hash() is hard to
beat (especially when the hash value is already cached).

Bye,
    Walter Dörwald