[Python-Dev] Hash collision security issue (now public)

Christian Heimes lists at cheimes.de
Thu Dec 29 14:32:21 CET 2011

Am 29.12.2011 13:57, schrieb Armin Ronacher:
> Hi,
> Something I should add to this now that I thought about it a bit more:
> Assuming this should be fixed on a language level the solution would
> probably be to salt hashes.  The most common hash to salt here is the
> PyUnicode hash for obvious reasons.
> - Option a: Compiled in Salt
>   + Easy to implement
>   - Breaks unittests most likely (those were broken in the first place
>     but that's still a very annoying change to make)
>   - Might cause problems with interoperability of Pythons compiled with
>     different hash salts
>   - You're not really solving the problem because each linux
>     distribution (besides Gentoo I guess) would have just one salt
>     compiled in and that would be popular enough to have the same
>     issue.
> - Option b: Environment variable for the salt
>   + Easy-ish to implement
>   + Easy to synchronize over different machines
>   - initialization for base types happens early and unpredictive which
>     makes it hard for embedded Python interpreters (think mod_wsgi and
>     other things) to specify the salt
> - Option c: Random salt at runtime
>   + Easy to implement
>   - impossible to synchronize
>   - breaks unittests in the same way as a compiled in salt would do

- Option d: Don't change __hash__ but only use randomized hash for
PyDictEntry lookup
  + Easy to implement
  - breaks only software to relies on a fixed order of dict keys
  - breaks only a few to no unit tests

IMHO we don't have to alter the outcome of hash("some string"), hash(1)
and all other related types. We just need to reduce the change the an
attacker can produce collisions in the dict (and set?) code that looks
up the slot (PyDictEntry). How about adding the random value in
Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
lookdict_unicode() (Python 3.x)? With this approach the hash of all our
objects stay the same and just the dict code needs to be altered. The
approach has also the benefit that all possible objects gain a
randomized hash.

> Also related: since this is a security related issue, would this be
> something that goes into Python 2?  Does that affect how a fix would
> look like?

IMHO it does affect the fix. A changed and randomized hash function may
break software that relies on a stable hash() function.


More information about the Python-Dev mailing list