[Python-Dev] Hash collision security issue (now public)

Christian Heimes lists at cheimes.de
Thu Dec 29 22:31:05 CET 2011

Am 29.12.2011 21:07, schrieb PJ Eby:
> I don't understand how that helps a collision attack.  If you can still
> generate two strings with the same (pre-randomized) hash, what
> difference does it make that the dict adds a random number?  The
> post-randomized number will still be the same, no?
> Or does this attack just rely on the hash *remainders* being the same?
>  If so, I can see how hashing the hash would help.  But since the
> attacker doesn't know the modulus, and it can change as the dictionary
> grows, I would expect the attack to require matching hashes, not just
> matching hash remainders...  unless I'm just completely off base here.

The attack doesn't need perfect collisions. The attacker calculates
strings in a way so that their hashes results in as many collision as
possible in the dict code. An attacker succeeds when the initial slot
for an hash is filled and as many subsequent slots of the perturbed
masked hash, too. Also an attacker can easily predict the size and
therefore the mask for the hash remainder. A POST request parser usually
starts with an empty dict and the growth rate of Python's dicts is well
documented. The changing mask makes the attack just a tiny bit more

The hash randomization idea adds a salt to throw the attacker of course.
Instead of

  position = hash & mask

it's now

  hash = salt + hash
  position = hash & mask

where salt is a random, process global value that is fixed for the life
time of the program. The salt also affects the perturbance during the
search for new slots. As you already stated this salt won't be affective
against full hash collisions.

The attack needs A LOT of problematic strings to become an issue,
possible hundred of thousands or even millions of keys in a very large
POST request. In reality an attacker won't reach the full theoretical
O(n^2) performance degradation for a hash table. But even more than O(n)
instead of O(1) for a million keys in each request has some impact on
your servers' CPUs. Some vendors have limited to POST request to 1 MB or
the amount of keys to 10,000 to work around the issue. One paper also
states that attacks on Python with 64bit is just academical for now.


More information about the Python-Dev mailing list