Re: [Python-Dev] Priority queue (binary heap) python code

26 Jun 2002

      ...
There's just no contest here.  BTrees have many other virtues, like
supporting range searches, and automatically playing nice with ZODB
persistence, but they're plain sluggish compared to dicts.  To be
completely
fair and unfair at the same time <wink>, there are also 4 other flavors
of
Zope BTree, purely for optimization reasons.  In particular, the IIBTree
maps Python ints to Python ints, and does so by avoiding Python int
objects
altogether, storing C longs directly and comparing them at native
"compare a
long to a long" C speed.  That's *almost* as fast as Python 2.1 int->int
dicts (which endure all-purpose Python object comparison), except for
deletion (the BTree spends a lot of time tearing apart all the tree
...
again).
Now that's a perfectly height-balanced search tree that "chunks up"
blocks
of keys for storage and speed efficiency, and rarely needs more than a
simple local adjustment to maintain balance.  I expect that puts it at
This is really interesting. When I was at Dragon (well, actually, after Tim
left and it became L&H), I ported my natural language parsing/understanding
system from Python to C++ so it could run quickly enough for embedded
devices. The core of this system was an associative container, so I knew
that its performance would be crucial. I used C++ generics which made it
really easy to swap in different associative container implementations, and
I tried lots, including the red-black tree containers built into most C++
implementations, and hash tables. My experience was that trying to come up
with a hash function that would give a significant speed increases over the
tree containers was extremely difficult, because it was really hard to come
up with a good hash function. Furthermore, even if I succeeded, it was like
black magic: it was inconsistent accross my test cases and there was no way
to understand why it worked well, and to get a feeling for how it would
scale to problems outside those cases. I ended up hand-coding a two-level
scheme based on binary searches in contiguous arrays which blew away
anything I'd been able to do with a hash table. My conclusion was that for
general-purpose use, the red-black tree was pretty good, despite its
relatively high memory overhead of 3 pointers per node: it places easy
requirements on the user (supply a strick weak ordering) and provides
predictable and smooth performance even asymptotically. On the other hand,
hashing requires that the user supply both a hash function and an equality
detector which must agree with one-another, requires hand-tuning of the
hash function for performance, and is rather more unpredictable. We've been
talking about adding hash-based containers to the C++ standard library but
I'm reluctant on these grounds. It seems to me that when you really care
about speed, some kind of hand-coded solution might be a better investment
than trying to come up with a good hash function.

I'm ready to believe that hashing is the most appropriate choice for
Python, but I wonder what makes the difference?

-Dave

From: "Tim Peters" <tim.one@comcast.net>

pointers
the
...
fast end of what can be achieved with a balanced tree scheme.
The ABC language (which Guido worked on before Python) used AVL trees for
just about everything under the covers.  It's not a coincidence that
Python
doesn't use balanced trees for anything <wink>.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev

Re: [Python-Dev] Priority queue (binary heap) python code

David Abrahams