Am 29.12.2011 21:07, schrieb PJ Eby:
I don't understand how that helps a collision attack. If you can still generate two strings with the same (pre-randomized) hash, what difference does it make that the dict adds a random number? The post-randomized number will still be the same, no?
Or does this attack just rely on the hash *remainders* being the same? If so, I can see how hashing the hash would help. But since the attacker doesn't know the modulus, and it can change as the dictionary grows, I would expect the attack to require matching hashes, not just matching hash remainders... unless I'm just completely off base here.
The attack doesn't need perfect collisions. The attacker calculates strings in a way so that their hashes results in as many collision as possible in the dict code. An attacker succeeds when the initial slot for an hash is filled and as many subsequent slots of the perturbed masked hash, too. Also an attacker can easily predict the size and therefore the mask for the hash remainder. A POST request parser usually starts with an empty dict and the growth rate of Python's dicts is well documented. The changing mask makes the attack just a tiny bit more challenging.
The hash randomization idea adds a salt to throw the attacker of course. Instead of
position = hash & mask
hash = salt + hash position = hash & mask
where salt is a random, process global value that is fixed for the life time of the program. The salt also affects the perturbance during the search for new slots. As you already stated this salt won't be affective against full hash collisions.
The attack needs A LOT of problematic strings to become an issue, possible hundred of thousands or even millions of keys in a very large POST request. In reality an attacker won't reach the full theoretical O(n^2) performance degradation for a hash table. But even more than O(n) instead of O(1) for a million keys in each request has some impact on your servers' CPUs. Some vendors have limited to POST request to 1 MB or the amount of keys to 10,000 to work around the issue. One paper also states that attacks on Python with 64bit is just academical for now.