Am 07.01.2012 12:02, schrieb Stefan Behnel:
Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's portable, known to provide a very good distribution for different string values and is generally fast on both 32 and 64 bit architectures.
The analysis is here:
It seems that there's also support for generating 64bit hash values (actually 2x32bits) efficiently.
This thread as well as the ticket is getting so long that people barely have a chance to catch up ...
Guido has stated that he doesn't want a completely new hash algorithm for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too.
I've done some experiments with FNV and Murmur3. With Murmur3 128bit I've seen some minor speed improvements on 64bit platforms. At first I was surprised but it makes sense. Murmur3 operates on uint32_t blocks while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2 bytes (USC2) or 4 bytes (USC4) types. Since most strings are either ASCII or UCS2, the inner loop of the current algorithm is more tight.
Admittedly, this may require some adaptation for the PEP393 unicode memory layout in order to produce identical hashes for all three representations if they represent the same content. So it's not a drop-in replacement.
Is this condition required and implemented at the moment?