[Python-Dev] Hash collision security issue (now public)

Christian Heimes lists at cheimes.de
Sun Jan 1 16:48:32 CET 2012

Am 01.01.2012 00:56, schrieb Guido van Rossum:
> ISTM the only reasonable thing is to have a random seed picked very
> early in the process, to be used to change the hash() function of
> str/bytes/unicode (in a way that they are still compatible with each other).
> The seed should be unique per process except it should survive fork()
> (but not exec()). I'm not worried about unrelated processes needing to
> have the same hash(), but I'm not against offering an env variable or
> command line flag to force the seed.

I've created a clone at http://hg.python.org/features/randomhash/ as a
testbed. The code creates the seed very early in PyInitializeEx(). The
method isn't called on fork() but on exec().

> I'm not too concerned about a 3rd party being able to guess the random
> seed -- this would require much more effort on their part, since they
> would have to generate a new set of colliding keys each time they think
> they have guessed the hash (as long as they can't force the seed -- this
> actually argues slightly *against* offering a way to force the seed,
> except that we have strong backwards compatibility requirements).

The talkers claim and have shown that it's too easy to pre-calculate
collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
might be a good idea to change the hashing algorithm, too. Paul as
listed some new algorithms. Ruby 1.9 is using FNV
http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a
good dispersion pattern. A hashing algorithm without a
meet-in-the-middle vulnerability would reduce the pressure on a good and
secure seed, too.

> We need to fix this as far back as Python 2.6, and it would be nice if a
> source patch was available that works on Python 2.5 -- personally I do
> have a need for a 2.5 fix and if nobody creates one I will probably end
> up backporting the fix from 2.6 to 2.5.


Should the randomization be disabled on 2.5 to 3.2 by default to reduce
backward compatibility issues?


More information about the Python-Dev mailing list