[Python-Dev] The Dictionary Gem is polished!
Tim Peters
tim.one@home.com
Mon, 18 Dec 2000 23:55:33 -0500
Something else to ponder: my tests show that the current ("old") algorithm
performs much better (somewhat worse than "new2" == new algorithm + warmup)
if incr is simply initialized like so instead:
if mp.oldalg:
incr = (_hash & 0xffffffffL) % (mp.ma_size - 1)
That's another way to get all the bits to contribute to the result. Note
that a mod by size-1 is analogous to "casting out nines" in decimal: it's
the same as breaking hash into fixed-sized pieces from the right (10 bits
each if size=2**10, etc), adding the pieces together, and repeating that
process until only one piece remains. IOW, it's a degenerate form of
division, but works well all the same. It didn't improve over that when I
tried a mod by the largest prime less than the table size (which suggests
we're sucking all we can out of the *probe* sequence given a sometimes-poor
starting index).
However, it's subject to the same weak clustering phenomenon as the old
method due to the ill-advised "~hash" operation in computing the initial
index. If ~ is also thrown away, it's as good as new2 (here I've tossed out
the "windows names", and "old" == existing algorithm except (a) get rid of ~
when computing index and (b) do mod by size-1 when computing incr):
N=1000
trips for strings old=230 new=261 new2=239
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=999 new=13212 new2=999
trips for random integers old=399 new=421 new2=410
N=2000
trips for strings old=787 new=1066 new2=827
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=0 new=26481 new2=1999
trips for random integers old=652 new=733 new2=650
N=3000
trips for strings old=547 new=760 new2=617
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=0 new=38701 new2=2999
trips for random integers old=724 new=743 new2=768
N=4000
trips for strings old=1311 new=1657 new2=1315
trips for bin strings old=0 new=0 new2=0
trips for bad integers old=0 new=53014 new2=3999
trips for random integers old=1476 new=1587 new2=1493
The new and new2 values differ in minor ways from the ones you posted
because I got rid of the ~ (the ~ has a bad interaction with "additive"
means of computing incr, because the ~ tends to move the index in the
opposite direction, and these moves in opposite directions tend to cancel
out when computing incr+index the first time).
too-bad-mod-is-expensive!-ly y'rs - tim