[Python-Dev] Hash collision security issue (now public)

Glenn Linderman v+python at g.nevcal.com
Thu Jan 5 20:14:51 CET 2012

On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote:
> Also consider that new 2.6.x would go as a security fix to old
> ubuntu, but all other packages won't, because they'll not contain
> security fixes. Just so you know

Why should CPython by constrained by broken policies of Ubuntu?  If the 
other packages must be fixed so they work correctly with a security fix 
in Python, then they should be considered as containing a security fix. 
If they aren't, then that is a broken policy.

On the other hand, it is very true that the seductive convenience of 
dict (readily available, good performance) in normal cases have created 
the vulnerability because its characteristics are a function of the data 
inserted, and when used for data that is from unknown, possibly 
malicious sources, that is a bug in the program that uses dict, not in 
dict itself.

So it seems to me that:

1) the security problem is not in CPython, but rather in web servers 
that use dict inappropriately.
2) changing CPython in a way that breaks code is not a security fix to 
CPython, but rather a gratuitous breakage of compatibility promises, 
wrapped in a security-fix lie.

The problem for CPython here can be summarized as follows:

a) it is being blamed for problems in web servers that are not problems 
in CPython
b) perhaps dict documentation is a bit too seductive, in not declaring 
that data from malicious sources could cause its performance to degrade 
significantly (but then, any programmer who has actually taken a decent 
set of programming classes should understand that, but on the other 
hand, there are programmers who have not taken such classes).
c) CPython provides no other mapping data structures that rival the 
performance and capabilities of dict as an alternative, nor can such 
data structures be written in CPython, as the performance of dict comes 
not only from hashing, but also from being written in C.

The solutions could be:

A) push back on the blame: it is not a CPython problem
B) perhaps add a warning to the documentation for the naïve, untrained 
C) consider adding an additional data structure to the language, and 
mention it in the B warning for versions 3.3+.

On the other hand, the web server vulnerability could be blamed on 
CPython in another way:

identify vulnerable packages in the stdlib that are likely the be used 
during the parsing of user-supplied data.  Ones that come to mind 
(Python 3.2) are:
urllib.parse (various parse* functions)  (package names different in 
Python 2.x)
cgi (parse_multipart, FieldStorage)

So, fixing the vulnerable packages could be a sufficient response, 
rather than changing the hash function.  How to fix?  Each of those 
above allocates and returns a dict.  Simply have each of those allocate 
and return and wrapped dict, which has the following behaviors:

i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the 
internal dict.

Changing these vulnerable packages rather than the hash function is a 
much more constrained change, and wouldn't create bugs in programs that 
erroneously depend on the current hash function directly or indirectly.

This would not fix web servers that use their own parsing and storage 
mechanism for <FORM> fields, if they have also inappropriately used a 
dict as their storage mechanism for user supplied data.  However, a 
similar solution could be similarly applied by the authors of those web 
servers, and would be a security fix to such packages, so should be 
applied to Ubuntu, if available there, or other systems with 
security-only fix acceptance.

This solution does not require changes to the hash, does not require a 
cryptographicly secure hash, and does not require code to be added to 
the initialization of Python before normal objects and mappings can be 

If a port doesn't contain a good random number generator, a weak one can 
be subsitituted, but such decisions can be made in Python code after the 
interpreter is initialized, and use of stdlib packages is available.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120105/cceb462e/attachment.html>

More information about the Python-Dev mailing list