[Python-Dev] Priority queue (binary heap) python code

Tim Peters tim.one@comcast.net
Tue, 25 Jun 2002 23:58:24 -0400

[Kevin O'Connor]
> ...
> In fact, I've always wondered why Python dictionaries use the hash
> algorithm instead of the more general binary tree algorithm.  :-}

Speed.  Zope and StandaloneZODB have a BTree package, which I've recently
spent a good amount of time optimizing.  Here's a timing driver:

from time import clock as now

N = 1000000
indices = range(N)

def doit(constructor):
    d = constructor()
    t1 = now()
    for i in indices:
        d[i] = i
    t2 = now()
    for i in indices:
        assert d[i] == i
    t3 = now()
    for i in indices:
        del d[i]
    t4 = now()
    return t2-t1, t3-t2, t4-t3

def drive(constructor, n):
    print "Using", constructor.__name__, "on", N, "entries"
    for i in range(n):
        d1, d2, d3 = doit(constructor)
        print "construct %6.3f" % d1
        print "query     %6.3f" % d2
        print "remove    %6.3f" % d3

def dict():
    return {}

from BTrees.OOBTree import OOBTree
drive(OOBTree, 3)
drive(dict, 3)

This is a little strained because I'm running it under Python 2.1.3.  This
favors the BTrees, because I also spent a lot of time optimizing Python's
dicts for the Python 2.2 release; 2.1 doesn't have that stuff.  OOBTrees are
most similar to Python's dicts, mapping objects to objects.  Here's a run:

Using OOBTree on 1000000 entries
construct  5.376
query      5.571
remove     4.065
construct  5.349
query      5.610
remove     4.211
construct  5.363
query      5.585
remove     4.374
Using dict on 1000000 entries
construct  1.411
query      1.336
remove     0.780
construct  1.382
query      1.335
remove     0.781
construct  1.376
query      1.334
remove     0.778

There's just no contest here.  BTrees have many other virtues, like
supporting range searches, and automatically playing nice with ZODB
persistence, but they're plain sluggish compared to dicts.  To be completely
fair and unfair at the same time <wink>, there are also 4 other flavors of
Zope BTree, purely for optimization reasons.  In particular, the IIBTree
maps Python ints to Python ints, and does so by avoiding Python int objects
altogether, storing C longs directly and comparing them at native "compare a
long to a long" C speed.  That's *almost* as fast as Python 2.1 int->int
dicts (which endure all-purpose Python object comparison), except for
deletion (the BTree spends a lot of time tearing apart all the tree pointers

Now that's a perfectly height-balanced search tree that "chunks up" blocks
of keys for storage and speed efficiency, and rarely needs more than a
simple local adjustment to maintain balance.  I expect that puts it at the
fast end of what can be achieved with a balanced tree scheme.

The ABC language (which Guido worked on before Python) used AVL trees for
just about everything under the covers.  It's not a coincidence that Python
doesn't use balanced trees for anything <wink>.