Am 01.01.2012 00:56, schrieb Guido van Rossum:
ISTM the only reasonable thing is to have a random seed picked very early in the process, to be used to change the hash() function of str/bytes/unicode (in a way that they are still compatible with each other).
The seed should be unique per process except it should survive fork() (but not exec()). I'm not worried about unrelated processes needing to have the same hash(), but I'm not against offering an env variable or command line flag to force the seed.
I've created a clone at http://hg.python.org/features/randomhash/ as a testbed. The code creates the seed very early in PyInitializeEx(). The method isn't called on fork() but on exec().
I'm not too concerned about a 3rd party being able to guess the random seed -- this would require much more effort on their part, since they would have to generate a new set of colliding keys each time they think they have guessed the hash (as long as they can't force the seed -- this actually argues slightly *against* offering a way to force the seed, except that we have strong backwards compatibility requirements).
The talkers claim and have shown that it's too easy to pre-calculate collisions with hashing algorithms similar to DJBX33X / DJBX33A. It might be a good idea to change the hashing algorithm, too. Paul as listed some new algorithms. Ruby 1.9 is using FNV http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a good dispersion pattern. A hashing algorithm without a meet-in-the-middle vulnerability would reduce the pressure on a good and secure seed, too.
We need to fix this as far back as Python 2.6, and it would be nice if a source patch was available that works on Python 2.5 -- personally I do have a need for a 2.5 fix and if nobody creates one I will probably end up backporting the fix from 2.6 to 2.5.
Should the randomization be disabled on 2.5 to 3.2 by default to reduce backward compatibility issues?