[Python-Dev] Counting collisions for the win

Nick Coghlan ncoghlan at gmail.com
Fri Jan 20 06:15:16 CET 2012

On Fri, Jan 20, 2012 at 2:00 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> With a limit of 35 collisions, it only takes 35 keys to to force a dict to
> raise an exception, if you are an attacker able to select colliding keys.
> We're trying to defend against an attacker who is able to force collisions,
> not one who is waiting for accidental collisions. I don't see that causing
> the dict to raise an exception helps matters: it just changes the attack
> from "keep the dict busy indefinitely" to "cause an exception and crash the
> application".

No, that's fundamentally misunderstanding the nature of the attack.
The reason the hash collision attack is a problem is because it allows
you to DoS a web service in a way that requires minimal client side
resources but can have a massive effect on the server. The attacker is
making a single request that takes the server an inordinately long
time to process, consuming CPU resources all the while, and likely
preventing the handling of any other requests (especially for an
event-based server, since the attack is CPU based, bypassing all use
of asynchronous IO).

With the 1000 collision limit in place, the attacker sends their
massive request, the affected dict quickly hits the limit, throws an
unhandled exception which is then caught by the web framework and
turned into a 500 Error response (or whatever's appropriate for the
protocol being attacked).

If a given web service doesn't *already* have a catch all handler to
keep an unexpected exception from bringing the entire service down,
then DoS attacks like this one are the least of its worries.

As for why other languages haven't gone this way, I have no idea.
There are lots of details relating to a language's hash and hash map
design that will drive how suitable randomisation is as an answer, and
it also depends greatly on how you decide to characterise the threat.

FWIW, Victor's analysis in the opening post of this thread matches the
conclusions I came to a few days ago, although he's been over the
alternatives far more thoroughly than I have.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list