[Python-Dev] Hash collision security issue (now public)
Glenn Linderman
v+python at g.nevcal.com
Thu Jan 5 20:14:51 CET 2012
On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote:
> Also consider that new 2.6.x would go as a security fix to old
> ubuntu, but all other packages won't, because they'll not contain
> security fixes. Just so you know
Why should CPython by constrained by broken policies of Ubuntu? If the
other packages must be fixed so they work correctly with a security fix
in Python, then they should be considered as containing a security fix.
If they aren't, then that is a broken policy.
On the other hand, it is very true that the seductive convenience of
dict (readily available, good performance) in normal cases have created
the vulnerability because its characteristics are a function of the data
inserted, and when used for data that is from unknown, possibly
malicious sources, that is a bug in the program that uses dict, not in
dict itself.
So it seems to me that:
1) the security problem is not in CPython, but rather in web servers
that use dict inappropriately.
2) changing CPython in a way that breaks code is not a security fix to
CPython, but rather a gratuitous breakage of compatibility promises,
wrapped in a security-fix lie.
The problem for CPython here can be summarized as follows:
a) it is being blamed for problems in web servers that are not problems
in CPython
b) perhaps dict documentation is a bit too seductive, in not declaring
that data from malicious sources could cause its performance to degrade
significantly (but then, any programmer who has actually taken a decent
set of programming classes should understand that, but on the other
hand, there are programmers who have not taken such classes).
c) CPython provides no other mapping data structures that rival the
performance and capabilities of dict as an alternative, nor can such
data structures be written in CPython, as the performance of dict comes
not only from hashing, but also from being written in C.
The solutions could be:
A) push back on the blame: it is not a CPython problem
B) perhaps add a warning to the documentation for the naïve, untrained
programmers
C) consider adding an additional data structure to the language, and
mention it in the B warning for versions 3.3+.
On the other hand, the web server vulnerability could be blamed on
CPython in another way:
identify vulnerable packages in the stdlib that are likely the be used
during the parsing of user-supplied data. Ones that come to mind
(Python 3.2) are:
urllib.parse (various parse* functions) (package names different in
Python 2.x)
cgi (parse_multipart, FieldStorage)
So, fixing the vulnerable packages could be a sufficient response,
rather than changing the hash function. How to fix? Each of those
above allocates and returns a dict. Simply have each of those allocate
and return and wrapped dict, which has the following behaviors:
i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the
internal dict.
Changing these vulnerable packages rather than the hash function is a
much more constrained change, and wouldn't create bugs in programs that
erroneously depend on the current hash function directly or indirectly.
This would not fix web servers that use their own parsing and storage
mechanism for <FORM> fields, if they have also inappropriately used a
dict as their storage mechanism for user supplied data. However, a
similar solution could be similarly applied by the authors of those web
servers, and would be a security fix to such packages, so should be
applied to Ubuntu, if available there, or other systems with
security-only fix acceptance.
This solution does not require changes to the hash, does not require a
cryptographicly secure hash, and does not require code to be added to
the initialization of Python before normal objects and mappings can be
created.
If a port doesn't contain a good random number generator, a weak one can
be subsitituted, but such decisions can be made in Python code after the
interpreter is initialized, and use of stdlib packages is available.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120105/cceb462e/attachment.html>
More information about the Python-Dev
mailing list