[Python-Dev] Dictionary sparseness

Jack Diederich jack@performancedrivers.com
Mon, 12 May 2003 02:16:57 -0400


On Sun, May 11, 2003 at 09:47:33PM -0400, Tim Peters wrote:
> [Agthorr]
> > An alternate optimization would be the additional of an immutable
> > dictionary type to the language, initialized from a mutable dictionary
> > type.  Upon creation, this dictionary would optimize itself, in a
> > manner similar to "gperf" program which creates (nearly) minimal
> > zero-collision hash tables.
>
> For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary
> search trees a few years ago, and I think you'd enjoy reading their papers:
> 
>     http://www.cs.princeton.edu/~rs/strings/
> 
> In particular, they're faster than hashing in the failing-lookup case.

They nest well too.  And you can do some caching if the higher level trees
are unchaning (local scope can shortcut into builtins).

I have a pure-python ternary tree and a C w/python wrappers of ternary trees
lying around.  They were written with symbol tables is mind, I haven't touched
em since my presentation proposal on the topic [ternary trees in general,
replacing python symbol dict w/ t-trees as the closing example] was declined 
for the Portland OReilly thingy (bruised ego, sour grapes, et al).

Cut-n-paste from an off-list for this undying thread below.
Hettinger's idea of treaps is a good one.  A ternary-treap would also
be possible.

-jack

[Raymond]
> My thought is to use a treap.  The binary search side would scan the
> hash values while the heap part would organize from most frequent to
> least frequently accessed key.  It could even be dynamic and re-arrange
> the heap according to usage patterns.

[me]
treaps would probably be a better fit than ternary trees, espcially for
builtins for the reasons you mention.  A good default ordering would
go a long way.

[me, about ternary trees]
They nest nicely, a valid 'next' node can be another ternary tree, so
pseudo code for import would be

newmodule = __import__('mymodule')
# assume __module_symdict__ is the module's symbol table
__module_symdict__['mymodule.'] = newmodule.__module_symdict__

a lookup for 'mymodule.some_function' would happily run from the
current module's tree into the 'mymodule' tree.  The '.' seperator
would only remain speical from a user's point of view.

If symbols don't share leading characters, ternary trees are just
binary trees that require additional bookkeeping.  This is probably
the case, so ternary trees become less neat [even if they do make for
prettier pictures].