[Python-Dev] String hash function multiplier

Tim Peters tim_one at email.msn.com
Tue Apr 13 23:40:32 EDT 2004


[Raymond]
> Does anyone have any issues with changing the hash multiplier for the
> string and Unicode hash functions?

Don't touch it unless you can prove major benefits -- it's a remarkable fact
of life that the current multiplier hasn't resulted in any real-life (but
non-contrived) pathological cases.

> Instead of 1000003, I would like to use Aho's 65599, a prime near 2**16
> that is efficiently expressible as (x << 6) + (x << 16) - x.   This
> replaces a multiply with fast adds and shifts (with the two shifts
> potentially being executed in parallel).

It's unclear why that's a good thing.  Perhaps you think shifts and adds are
faster?  I wouldn't -- the imul instruction on modern Pentiums is very fast.

It is clear why it may be a bad thing:  that it *can* be expressed as just a
couple shifts and adds makes it suspect as a good scrambler (read Knuth).

> Googling for "hash 65599" shows a long history of widespread use and
> testing without any problems.

Testing in the context of Python string hashing?  I didn't think so <wink>.
The current multiplier has been studied extensively *in* Python, both via
real-life use and via large-scale focused statistical testing (we got some
money to do that during BeOpen.com's brief life -- a surprise that came out
of that was that CRC made a terrible string hash function as the number of
input strings got large).  The right thing to compare Python's string hash
to is "the standard" Fowler-Noll-Vo string hash, which was developed
independently, but which also ended up using a large multiplier.





More information about the Python-Dev mailing list