On Wed, Dec 28, 2011 at 5:37 PM, Jesse Noller firstname.lastname@example.org wrote:
On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:
A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
Although it's a security issue I'm posting it here because it is now public and seems important.
The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy
This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
Their recommended fix is to randomize the hash function.
All the best,
Back up link for the PDF: http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_plat...
Ocert disclosure: http://www.ocert.org/advisories/ocert-2011-003.html
Discussion of hash functions in general: http://burtleburtle.net/bob/hash/doobs.html Two of the best hash functions that currently exist: http://code.google.com/p/cityhash/ and http://code.google.com/p/smhasher/wiki/MurmurHash.
I'm not sure exactly what problem the paper is primarily complaining about: 1. Multiply+add and multiply+xor hashes are weak: this would be solved by changing to either of the better-and-faster hashes I linked to above. On the other hand: http://mail.python.org/pipermail/python-3000/2007-September/010327.html 2. It's possible to find collisions in any hash function in a 32-bit space: only solved by picking a varying seed at startup or compile time.
If you decide to change to a stronger hash function overall, it might also be useful to change the advice "to somehow mix together (e.g. using exclusive or) the hash values for the components" in http://docs.python.org/py3k/reference/datamodel.html#object.__hash__. hash(tuple(components)) will likely be better if tuple's hash is improved.
Hash functions are already unstable across Python versions. Making them unstable across interpreter processes (multiprocessing doesn't share dicts, right?) doesn't sound like a big additional problem. Users who want a distributed hash table will need to pull their own hash function out of hashlib or re-implement a non-cryptographic hash instead of using the built-in one, but they probably need to do that already to allow themselves to upgrade Python.