[Python-Dev] Hash collision security issue (now public)

Glenn Linderman v+python at g.nevcal.com
Fri Jan 6 04:39:30 CET 2012

On 1/5/2012 4:10 PM, Nick Coghlan wrote:
> On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka<storchaka at gmail.com>  wrote:
>> 05.01.12 21:14, Glenn Linderman написав(ла):
>>> So, fixing the vulnerable packages could be a sufficient response,
>>> rather than changing the hash function.  How to fix?  Each of those
>>> above allocates and returns a dict.  Simply have each of those allocate
>>> and return and wrapped dict, which has the following behaviors:
>>> i) during __init__, create a local, random, string.
>>> ii) for all key values, prepend the string, before passing it to the
>>> internal dict.
>> Good idea.

Thanks for the implementation, Serhiy.  That is the sort of thing I had 
in mind, indeed.
> Not a good idea - a lot of the 3rd party tests that depend on dict
> ordering are going to be using those modules anyway,

Stats? Didn't someone post a list of tests  that fail when changing the 
hash? Oh, those were stdlib tests, not 3rd party tests.  I'm not sure 
how to gather the stats, then, are you?

> so scattering our
> solution across half the standard library is needlessly creating
> additional work without really reducing the incompatibility problem.

Half the standard library?  no one has cared to augment my list of 
modules, but I have seen reference to JSON in addition to cgi and 
urllib.parse.  I think there are more than 6 modules in the standard 

> If we're going to change anything, it may as well be the string
> hashing algorithm itself.

Changing the string hashing algorithm is known (or at least no one has 
argued otherwise) to be a source of backward incompatibility that will 
break programs.  My proposal (and Serhiy's implementation, assuming it 
works, or can be easily tweaked to work, I haven't reviewed it in detail 
or attempted to test it) will only break programs that have vulnerabilities.

I failed to mention one other benefit of my proposal: every web request 
would have a different random prefix, so attempting to gather info is 
futile: the next request has a different random prefix, so different 
strings would collide.

> Cheers,
> Nick.
Indeed it is nice when we can be cheery even when arguing, for the most 
part :)  I've enjoyed reading the discussions in this forum because most 
folks have respect for other people's opinions, even when they differ.
