[Python-Dev] Hash collision security issue (now public)

Armin Ronacher armin.ronacher at active-4.com
Thu Dec 29 13:57:07 CET 2011


Something I should add to this now that I thought about it a bit more:

Assuming this should be fixed on a language level the solution would
probably be to salt hashes.  The most common hash to salt here is the
PyUnicode hash for obvious reasons.

- Option a: Compiled in Salt
  + Easy to implement
  - Breaks unittests most likely (those were broken in the first place
    but that's still a very annoying change to make)
  - Might cause problems with interoperability of Pythons compiled with
    different hash salts
  - You're not really solving the problem because each linux
    distribution (besides Gentoo I guess) would have just one salt
    compiled in and that would be popular enough to have the same

- Option b: Environment variable for the salt
  + Easy-ish to implement
  + Easy to synchronize over different machines
  - initialization for base types happens early and unpredictive which
    makes it hard for embedded Python interpreters (think mod_wsgi and
    other things) to specify the salt

- Option c: Random salt at runtime
  + Easy to implement
  - impossible to synchronize
  - breaks unittests in the same way as a compiled in salt would do

Where to add the salt to?  Unicode strings and bytestrings (byte
objects) I guess since those are the most common offenders.  Sometimes
tuples are keys of dictionaries but in that case a contributing factor
to the hash is the string in the tuple anyways.

Also related: since this is a security related issue, would this be
something that goes into Python 2?  Does that affect how a fix would
look like?


