Re: [Python-Dev] Hash collision security issue (now public)

7 Jan 2012

      Am 07.01.2012 12:02, schrieb Stefan Behnel:
...
Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's
portable, known to provide a very good distribution for different string
values and is generally fast on both 32 and 64 bit architectures.
http://burtleburtle.net/bob/c/lookup3.c
The analysis is here:
http://burtleburtle.net/bob/hash/doobs.html
It seems that there's also support for generating 64bit hash values
(actually 2x32bits) efficiently.
This thread as well as the ticket is getting so long that people barely
have a chance to catch up ...

Guido has stated that he doesn't want a completely new hash algorithm
for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too.

I've done some experiments with FNV and Murmur3. With Murmur3 128bit
I've seen some minor speed improvements on 64bit platforms. At first I
was surprised but it makes sense. Murmur3 operates on uint32_t blocks
while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2
bytes (USC2) or 4 bytes (USC4) types. Since most strings are either
ASCII or UCS2, the inner loop of the current algorithm is more tight.
...
Admittedly, this may require some adaptation for the PEP393 unicode memory
layout in order to produce identical hashes for all three representations
if they represent the same content. So it's not a drop-in replacement.
Is this condition required and implemented at the moment?

Christian

Re: [Python-Dev] Hash collision security issue (now public)

Christian Heimes