On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote:
Also consider that new 2.6.x would go as a security fix to old
ubuntu, but all other packages won't, because they'll not contain
security fixes. Just so you know

Why should CPython by constrained by broken policies of Ubuntu?  If the other packages must be fixed so they work correctly with a security fix in Python, then they should be considered as containing a security fix. If they aren't, then that is a broken policy.

On the other hand, it is very true that the seductive convenience of dict (readily available, good performance) in normal cases have created the vulnerability because its characteristics are a function of the data inserted, and when used for data that is from unknown, possibly malicious sources, that is a bug in the program that uses dict, not in dict itself.

So it seems to me that:

1) the security problem is not in CPython, but rather in web servers that use dict inappropriately.
2) changing CPython in a way that breaks code is not a security fix to CPython, but rather a gratuitous breakage of compatibility promises, wrapped in a security-fix lie.

The problem for CPython here can be summarized as follows:

a) it is being blamed for problems in web servers that are not problems in CPython
b) perhaps dict documentation is a bit too seductive, in not declaring that data from malicious sources could cause its performance to degrade significantly (but then, any programmer who has actually taken a decent set of programming classes should understand that, but on the other hand, there are programmers who have not taken such classes).
c) CPython provides no other mapping data structures that rival the performance and capabilities of dict as an alternative, nor can such data structures be written in CPython, as the performance of dict comes not only from hashing, but also from being written in C.

The solutions could be:

A) push back on the blame: it is not a CPython problem
B) perhaps add a warning to the documentation for the naïve, untrained programmers
C) consider adding an additional data structure to the language, and mention it in the B warning for versions 3.3+.

On the other hand, the web server vulnerability could be blamed on CPython in another way:

identify vulnerable packages in the stdlib that are likely the be used during the parsing of user-supplied data.  Ones that come to mind (Python 3.2) are:
urllib.parse (various parse* functions)  (package names different in Python 2.x)
cgi (parse_multipart, FieldStorage)

So, fixing the vulnerable packages could be a sufficient response, rather than changing the hash function.  How to fix?  Each of those above allocates and returns a dict.  Simply have each of those allocate and return and wrapped dict, which has the following behaviors:

i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the internal dict.

Changing these vulnerable packages rather than the hash function is a much more constrained change, and wouldn't create bugs in programs that erroneously depend on the current hash function directly or indirectly.

This would not fix web servers that use their own parsing and storage mechanism for <FORM> fields, if they have also inappropriately used a dict as their storage mechanism for user supplied data.  However, a similar solution could be similarly applied by the authors of those web servers, and would be a security fix to such packages, so should be applied to Ubuntu, if available there, or other systems with security-only fix acceptance.

This solution does not require changes to the hash, does not require a cryptographicly secure hash, and does not require code to be added to the initialization of Python before normal objects and mappings can be created.

If a port doesn't contain a good random number generator, a weak one can be subsitituted, but such decisions can be made in Python code after the interpreter is initialized, and use of stdlib packages is available.