Mark Shannon wrote:
Michael Foord wrote:
A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
Although it's a security issue I'm posting it here because it is now public and seems important.
The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy
This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
Their recommended fix is to randomize the hash function.
The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward. There is no need to change the dictionary code.
A possible (*untested*) patch is attached. I'll leave it for those more familiar with unicodeobject.c to do properly.
The paper mentions that several web frameworks work around this by limiting the number of parameters per GET/POST/HEAD request.
This sounds like a better alternative than randomizing the hash function of strings.
Uncontrollable randomization has issues when you work with multi-process setups, since the processes would each use different hash values for identical strings. Putting the base_hash value under application control could be done to solve this problem, making sure that all processes use the same random base value.
BTW: Since your randomization trick uses the current time, it would also be rather easy to tune an attack to find the currently used base_hash. To make this safe, you'd have to use a more random source for initializing the base_hash.
Note that the same hash collision attack can be used for other key types as well, e.g. integers (where it's very easy to find hash collisions), so this kind of randomization would have to be applied to other basic types too.