[Python-Dev] Hash collision security issue (now public)

Mon Jan 2 06:55:52 CET 2012

I fixed a couple things in my proposed algorithm:
https://gist.github.com/0a91e52efa74f61858b5

I had a typo, and used 21 instead of 12 for the size multiplier. We
definitely don't need 2MB random data.

The initialization of r was broken. Now it is an array of ints, so
there's no conversion when it's used. I've adjusted it so there's 8k
of random data, broken into 2048 ints.

I added a length-based seed to the initial value of x. This prevents
single-characters from being used to enumerate raw values from r.
This is similar to the change proposed by Christian Heimes.

Most importantly, I moved the xor with r[x % len_r] down a line.
Before, it wasn't being applied to the last character.

> Christian Heimes said:
> We also need to special case short strings. The above routine hands over
> the seed to attackers, if he is able to retrieve lots of single
> character hashes.

The updated code always includes at least 2 lookups from r, which I
believe solves the single-character enumeration problem. If we
special-case part of our hash function for short strings, we may get
suboptimal collisions between the two types of hashes.

I think Ruby uses FNV-1 with a salt, making it less vulnerable to
this. FNV is otherwise similar to our existing hash function.

For the record, cryptographically strong hash functions are in the
neighborhood of 400% slower than our existing hash function.

> Terry Reedy said:
> I understood Alexander Klink and Julian Wälde, hashDoS at alech.de, as saying
> that they consider that using a random non-zero start value is sufficient to
> make the hash non-vulnerable.

I've been talking to them. They're happy to look at our proposed
changes. They indicate that a non-zero start value is sufficient to
prevent the attack, but D. J. Bernstein disagrees with them. He also
has indicated a willingness to look at our solution.

-Paul