[Python-Dev] String hash function multiplier
tim_one at email.msn.com
Tue Apr 13 23:40:32 EDT 2004
> Does anyone have any issues with changing the hash multiplier for the
> string and Unicode hash functions?
Don't touch it unless you can prove major benefits -- it's a remarkable fact
of life that the current multiplier hasn't resulted in any real-life (but
non-contrived) pathological cases.
> Instead of 1000003, I would like to use Aho's 65599, a prime near 2**16
> that is efficiently expressible as (x << 6) + (x << 16) - x. This
> replaces a multiply with fast adds and shifts (with the two shifts
> potentially being executed in parallel).
It's unclear why that's a good thing. Perhaps you think shifts and adds are
faster? I wouldn't -- the imul instruction on modern Pentiums is very fast.
It is clear why it may be a bad thing: that it *can* be expressed as just a
couple shifts and adds makes it suspect as a good scrambler (read Knuth).
> Googling for "hash 65599" shows a long history of widespread use and
> testing without any problems.
Testing in the context of Python string hashing? I didn't think so <wink>.
The current multiplier has been studied extensively *in* Python, both via
real-life use and via large-scale focused statistical testing (we got some
money to do that during BeOpen.com's brief life -- a surprise that came out
of that was that CRC made a terrible string hash function as the number of
input strings got large). The right thing to compare Python's string hash
to is "the standard" Fowler-Noll-Vo string hash, which was developed
independently, but which also ended up using a large multiplier.
More information about the Python-Dev