[Python-Dev] Hash collision security issue (now public)

Jeffrey Yasskin jyasskin at gmail.com
Sat Dec 31 22:04:02 CET 2011


On Wed, Dec 28, 2011 at 5:37 PM, Jesse Noller <jnoller at gmail.com> wrote:
>
>
> On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:
>
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>>
>> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>>
>> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>> 7 minutes of CPU usage for a 1 MB request
>> ~20 kbits/s → keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>>
>> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>>
>> All the best,
>>
>> Michael
>>
> Back up link for the PDF:
> http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Ocert disclosure:
> http://www.ocert.org/advisories/ocert-2011-003.html

Discussion of hash functions in general:
http://burtleburtle.net/bob/hash/doobs.html
Two of the best hash functions that currently exist:
http://code.google.com/p/cityhash/ and
http://code.google.com/p/smhasher/wiki/MurmurHash.

I'm not sure exactly what problem the paper is primarily complaining about:
1. Multiply+add and multiply+xor hashes are weak: this would be solved
by changing to either of the better-and-faster hashes I linked to
above. On the other hand:
http://mail.python.org/pipermail/python-3000/2007-September/010327.html
2. It's possible to find collisions in any hash function in a 32-bit
space: only solved by picking a varying seed at startup or compile
time.

If you decide to change to a stronger hash function overall, it might
also be useful to change the advice "to somehow mix together (e.g.
using exclusive or) the hash values for the components" in
http://docs.python.org/py3k/reference/datamodel.html#object.__hash__.
hash(tuple(components)) will likely be better if tuple's hash is
improved.

Hash functions are already unstable across Python versions. Making
them unstable across interpreter processes (multiprocessing doesn't
share dicts, right?) doesn't sound like a big additional problem.
Users who want a distributed hash table will need to pull their own
hash function out of hashlib or re-implement a non-cryptographic hash
instead of using the built-in one, but they probably need to do that
already to allow themselves to upgrade Python.

Jeffrey


More information about the Python-Dev mailing list