[Python-Dev] The Dictionary Gem is polished!

Tim Peters tim.one@home.com
Mon, 18 Dec 2000 23:55:33 -0500


Something else to ponder:  my tests show that the current ("old") algorithm
performs much better (somewhat worse than "new2" == new algorithm + warmup)
if incr is simply initialized like so instead:

        if mp.oldalg:
            incr = (_hash & 0xffffffffL) % (mp.ma_size - 1)

That's another way to get all the bits to contribute to the result.  Note
that a mod by size-1 is analogous to "casting out nines" in decimal:  it's
the same as breaking hash into fixed-sized pieces from the right (10 bits
each if size=2**10, etc), adding the pieces together, and repeating that
process until only one piece remains.  IOW, it's a degenerate form of
division, but works well all the same.  It didn't improve over that when I
tried a mod by the largest prime less than the table size (which suggests
we're sucking all we can out of the *probe* sequence given a sometimes-poor
starting index).

However, it's subject to the same weak clustering phenomenon as the old
method due to the ill-advised "~hash" operation in computing the initial
index.  If ~ is also thrown away, it's as good as new2 (here I've tossed out
the "windows names", and "old" == existing algorithm except (a) get rid of ~
when computing index and (b) do mod by size-1 when computing incr):

N=1000
trips for strings old=230 new=261 new2=239
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=999 new=13212 new2=999
trips for random integers old=399 new=421 new2=410
N=2000
trips for strings old=787 new=1066 new2=827
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=0 new=26481 new2=1999
trips for random integers old=652 new=733 new2=650
N=3000
trips for strings old=547 new=760 new2=617
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=0 new=38701 new2=2999
trips for random integers old=724 new=743 new2=768
N=4000
trips for strings old=1311 new=1657 new2=1315
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=0 new=53014 new2=3999
trips for random integers old=1476 new=1587 new2=1493

The new and new2 values differ in minor ways from the ones you posted
because I got rid of the ~ (the ~ has a bad interaction with "additive"
means of computing incr, because the ~ tends to move the index in the
opposite direction, and these moves in opposite directions tend to cancel
out when computing incr+index the first time).

too-bad-mod-is-expensive!-ly y'rs  - tim