[Python-Dev] The Dictionary Gem is polished!

Christian Tismer tismer@tismer.com
Tue, 19 Dec 2000 17:48:58 +0200


Tim Peters wrote:
> 
> Something else to ponder:  my tests show that the current ("old") algorithm
> performs much better (somewhat worse than "new2" == new algorithm + warmup)
> if incr is simply initialized like so instead:
> 
>         if mp.oldalg:
>             incr = (_hash & 0xffffffffL) % (mp.ma_size - 1)

Sure. I did this as well, but didn't consider a division
since it said to be too slow. But this is very platform
dependant. On Pentiums this might be not noticeable.

> That's another way to get all the bits to contribute to the result.  Note
> that a mod by size-1 is analogous to "casting out nines" in decimal:  it's
> the same as breaking hash into fixed-sized pieces from the right (10 bits
> each if size=2**10, etc), adding the pieces together, and repeating that
> process until only one piece remains.  IOW, it's a degenerate form of
> division, but works well all the same.  It didn't improve over that when I
> tried a mod by the largest prime less than the table size (which suggests
> we're sucking all we can out of the *probe* sequence given a sometimes-poor
> starting index).

Again I tried this too. Instead of the largest near prime I used
the nearest prime. Remarkably the nearest prime is identical
to the primitive element in a lot of cases.
But no improvement over the modulus.

> 
> However, it's subject to the same weak clustering phenomenon as the old
> method due to the ill-advised "~hash" operation in computing the initial
> index.  If ~ is also thrown away, it's as good as new2 (here I've tossed out
> the "windows names", and "old" == existing algorithm except (a) get rid of ~
> when computing index and (b) do mod by size-1 when computing incr):
...
> The new and new2 values differ in minor ways from the ones you posted
> because I got rid of the ~ (the ~ has a bad interaction with "additive"
> means of computing incr, because the ~ tends to move the index in the
> opposite direction, and these moves in opposite directions tend to cancel
> out when computing incr+index the first time).

Remarkable.

> too-bad-mod-is-expensive!-ly y'rs  - tim

Yes. The wheel is cheapest yet.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     where do you want to jump today?   http://www.stackless.com