[issue13703] Hash collision security issue

Paul McMillan report at bugs.python.org
Thu Jan 5 01:31:43 CET 2012


Paul McMillan <paul at mcmillan.ws> added the comment:

This is not something that can be fixed by limiting the size of POST/GET. 

Parsing documents (even offline) can generate these problems. I can create books that calibre (a Python-based ebook format shifting tool) can't convert, but are otherwise perfectly valid for non-python devices. If I'm allowed to insert usernames into a database and you ever retrieve those in a dict, you're vulnerable. If I can post things one at a time that eventually get parsed into a dict (like the tag example), you're vulnerable. I can generate web traffic that creates log files that are unparsable (even offline) in Python if dicts are used anywhere. Any application that accepts data from users needs to be considered.

Even if the web framework has a dictionary implementation that randomizes the hashes so it's not vulnerable, the entire python standard library uses dicts all over the place. If this is a problem which must be fixed by the framework, they must reinvent every standard library function they hope to use.

Any non-trivial python application which parses data needs the fix. The entire standard library needs the fix if is to be relied upon by applications which accept data. It makes sense to fix Python.

Of course we must fix all the basic hashing functions in python, not just the string hash. There aren't that many. 

Marc-Andre:
If you look at my proposed code, you'll notice that we do more than simply shift the period of the hash. It's not trivial for an attacker to create colliding hash functions without knowing the key.

Since speed is a concern, I think that the proposal to avoid using the random hash for short strings is a good idea. Additionally, randomizing only some of the characters in longer strings will allow us to improve security without compromising speed significantly.

I suggest that we don't randomize strings shorter than 6 characters. For longer strings, we randomize the first and last 5 characters. This means we're only adding additional work to a max of 10 rounds of the hash, and only for longer strings. Collisions with the hash from short strings should be minimal.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________


More information about the Python-bugs-list mailing list