[Python-Dev] Hash collision security issue (now public)

M.-A. Lemburg mal at egenix.com
Thu Dec 29 13:49:44 CET 2011

Mark Shannon wrote:
> Michael Foord wrote:
>> Hello all,
>> A paper (well, presentation) has been published highlighting security problems with the hashing
>> algorithm (exploiting collisions) in many programming languages Python included:
>> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of
>> cpu. For Python the figures they quoted:
>>     reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>>     7 minutes of CPU usage for a 1 MB request
>>     ~20 kbits/s → keep one Core Duo core busy
>> This was apparently reported to the security list, but hasn't been responded to beyond an
>> acknowledgement on November 24th (the original report didn't make it onto the security list
>> because it was held in a moderation queue).
>> The same vulnerability was reported against various languages and web frameworks, and is already
>> fixed in some of them.
>> Their recommended fix is to randomize the hash function.
> The attack relies on being able to predict the hash value for a given string. Randomising the string
> hash function is quite straightforward.
> There is no need to change the dictionary code.
> A possible (*untested*) patch is attached. I'll leave it for those more familiar with
> unicodeobject.c to do properly.

The paper mentions that several web frameworks work around this by
limiting the number of parameters per GET/POST/HEAD request.

This sounds like a better alternative than randomizing the hash
function of strings.

Uncontrollable randomization has issues when you work with
multi-process setups, since the processes would each use different
hash values for identical strings. Putting the base_hash value
under application control could be done to solve this problem,
making sure that all processes use the same random base value.

BTW: Since your randomization trick uses the current time, it would
also be rather easy to tune an attack to find the currently
used base_hash. To make this safe, you'd have to use a more
random source for initializing the base_hash.

Note that the same hash collision attack can be used for
other key types as well, e.g. integers (where it's very easy
to find hash collisions), so this kind of randomization
would have to be applied to other basic types too.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Dec 29 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list