[Kevin O'Connor]
... In fact, I've always wondered why Python dictionaries use the hash algorithm instead of the more general binary tree algorithm. :-}
Speed. Zope and StandaloneZODB have a BTree package, which I've recently spent a good amount of time optimizing. Here's a timing driver: """ from time import clock as now N = 1000000 indices = range(N) def doit(constructor): d = constructor() t1 = now() for i in indices: d[i] = i t2 = now() for i in indices: assert d[i] == i t3 = now() for i in indices: del d[i] t4 = now() return t2-t1, t3-t2, t4-t3 def drive(constructor, n): print "Using", constructor.__name__, "on", N, "entries" for i in range(n): d1, d2, d3 = doit(constructor) print "construct %6.3f" % d1 print "query %6.3f" % d2 print "remove %6.3f" % d3 def dict(): return {} from BTrees.OOBTree import OOBTree drive(OOBTree, 3) drive(dict, 3) """ This is a little strained because I'm running it under Python 2.1.3. This favors the BTrees, because I also spent a lot of time optimizing Python's dicts for the Python 2.2 release; 2.1 doesn't have that stuff. OOBTrees are most similar to Python's dicts, mapping objects to objects. Here's a run: Using OOBTree on 1000000 entries construct 5.376 query 5.571 remove 4.065 construct 5.349 query 5.610 remove 4.211 construct 5.363 query 5.585 remove 4.374 Using dict on 1000000 entries construct 1.411 query 1.336 remove 0.780 construct 1.382 query 1.335 remove 0.781 construct 1.376 query 1.334 remove 0.778 There's just no contest here. BTrees have many other virtues, like supporting range searches, and automatically playing nice with ZODB persistence, but they're plain sluggish compared to dicts. To be completely fair and unfair at the same time <wink>, there are also 4 other flavors of Zope BTree, purely for optimization reasons. In particular, the IIBTree maps Python ints to Python ints, and does so by avoiding Python int objects altogether, storing C longs directly and comparing them at native "compare a long to a long" C speed. That's *almost* as fast as Python 2.1 int->int dicts (which endure all-purpose Python object comparison), except for deletion (the BTree spends a lot of time tearing apart all the tree pointers again). Now that's a perfectly height-balanced search tree that "chunks up" blocks of keys for storage and speed efficiency, and rarely needs more than a simple local adjustment to maintain balance. I expect that puts it at the fast end of what can be achieved with a balanced tree scheme. The ABC language (which Guido worked on before Python) used AVL trees for just about everything under the covers. It's not a coincidence that Python doesn't use balanced trees for anything <wink>.