[Python-Dev] Dictionary sparseness
Jack Diederich
jack@performancedrivers.com
Mon, 12 May 2003 02:16:57 -0400
On Sun, May 11, 2003 at 09:47:33PM -0400, Tim Peters wrote:
> [Agthorr]
> > An alternate optimization would be the additional of an immutable
> > dictionary type to the language, initialized from a mutable dictionary
> > type. Upon creation, this dictionary would optimize itself, in a
> > manner similar to "gperf" program which creates (nearly) minimal
> > zero-collision hash tables.
>
> For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary
> search trees a few years ago, and I think you'd enjoy reading their papers:
>
> http://www.cs.princeton.edu/~rs/strings/
>
> In particular, they're faster than hashing in the failing-lookup case.
They nest well too. And you can do some caching if the higher level trees
are unchaning (local scope can shortcut into builtins).
I have a pure-python ternary tree and a C w/python wrappers of ternary trees
lying around. They were written with symbol tables is mind, I haven't touched
em since my presentation proposal on the topic [ternary trees in general,
replacing python symbol dict w/ t-trees as the closing example] was declined
for the Portland OReilly thingy (bruised ego, sour grapes, et al).
Cut-n-paste from an off-list for this undying thread below.
Hettinger's idea of treaps is a good one. A ternary-treap would also
be possible.
-jack
[Raymond]
> My thought is to use a treap. The binary search side would scan the
> hash values while the heap part would organize from most frequent to
> least frequently accessed key. It could even be dynamic and re-arrange
> the heap according to usage patterns.
[me]
treaps would probably be a better fit than ternary trees, espcially for
builtins for the reasons you mention. A good default ordering would
go a long way.
[me, about ternary trees]
They nest nicely, a valid 'next' node can be another ternary tree, so
pseudo code for import would be
newmodule = __import__('mymodule')
# assume __module_symdict__ is the module's symbol table
__module_symdict__['mymodule.'] = newmodule.__module_symdict__
a lookup for 'mymodule.some_function' would happily run from the
current module's tree into the 'mymodule' tree. The '.' seperator
would only remain speical from a user's point of view.
If symbols don't share leading characters, ternary trees are just
binary trees that require additional bookkeeping. This is probably
the case, so ternary trees become less neat [even if they do make for
prettier pictures].