On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote:
Also consider that new 2.6.x would go as a security fix to old
ubuntu, but all other packages won't, because they'll not contain
security fixes. Just so you know
Why should CPython by constrained by broken policies of Ubuntu? If
the other packages must be fixed so they work correctly with a
security fix in Python, then they should be considered as containing
a security fix. If they aren't, then that is a broken policy.
On the other hand, it is very true that the seductive convenience of
dict (readily available, good performance) in normal cases have
created the vulnerability because its characteristics are a function
of the data inserted, and when used for data that is from unknown,
possibly malicious sources, that is a bug in the program that uses
dict, not in dict itself.
So it seems to me that:
1) the security problem is not in CPython, but rather in web servers
that use dict inappropriately.
2) changing CPython in a way that breaks code is not a security fix
to CPython, but rather a gratuitous breakage of compatibility
promises, wrapped in a security-fix lie.
The problem for CPython here can be summarized as follows:
a) it is being blamed for problems in web servers that are not
problems in CPython
b) perhaps dict documentation is a bit too seductive, in not
declaring that data from malicious sources could cause its
performance to degrade significantly (but then, any programmer who
has actually taken a decent set of programming classes should
understand that, but on the other hand, there are programmers who
have not taken such classes).
c) CPython provides no other mapping data structures that rival the
performance and capabilities of dict as an alternative, nor can such
data structures be written in CPython, as the performance of dict
comes not only from hashing, but also from being written in C.
The solutions could be:
A) push back on the blame: it is not a CPython problem
B) perhaps add a warning to the documentation for the naïve,
C) consider adding an additional data structure to the language, and
mention it in the B warning for versions 3.3+.
On the other hand, the web server vulnerability could be blamed on
CPython in another way:
identify vulnerable packages in the stdlib that are likely the be
used during the parsing of user-supplied data. Ones that come to
mind (Python 3.2) are:
urllib.parse (various parse* functions) (package names different in
cgi (parse_multipart, FieldStorage)
So, fixing the vulnerable packages could be a sufficient response,
rather than changing the hash function. How to fix? Each of those
above allocates and returns a dict. Simply have each of those
allocate and return and wrapped dict, which has the following
i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the
Changing these vulnerable packages rather than the hash function is
a much more constrained change, and wouldn't create bugs in programs
that erroneously depend on the current hash function directly or
This would not fix web servers that use their own parsing and
storage mechanism for <FORM> fields, if they have also
inappropriately used a dict as their storage mechanism for user
supplied data. However, a similar solution could be similarly
applied by the authors of those web servers, and would be a security
fix to such packages, so should be applied to Ubuntu, if available
there, or other systems with security-only fix acceptance.
This solution does not require changes to the hash, does not require
a cryptographicly secure hash, and does not require code to be added
to the initialization of Python before normal objects and mappings
can be created.
If a port doesn't contain a good random number generator, a weak one
can be subsitituted, but such decisions can be made in Python code
after the interpreter is initialized, and use of stdlib packages is