[Python-Dev] Status of the fix for the hash collision vulnerability

Mon Jan 16 23:23:40 CET 2012

> As I understand it, the way the attack works is that a *single*
> malicious request from the attacker can DoS the server by eating CPU
> resources while evaluating a massive collision chain induced in a dict
> by attacker supplied data. Explicitly truncating the collision chain
> boots them out almost immediately (likely with a 500 response for an
> internal server error), so they no longer affect other events, threads
> and processes on the same machine.

This is only true in the specific attack presented at 28c3. If an
attacker can insert data without triggering the attack, it's possible
to produce (in the example of a web application) urls that (regardless
of the request) always produce pathological behavior. For example, a
collection of pathological usernames might make it impossible to list
users (and so choose which ones to delete) without resorting to
removing the problem data at an SQL level.

This is why the "simply throw an error" solution isn't a complete fix.
Making portions of an interface unusable for regular users is clearly
a bad thing, and is clearly applicable to other types of poisoned data
as well. We need to detect collisions and work around them
transparently.

> However, such an app would have been crippled by the original DoS
> anyway, since its performance would have been gutted - the collision
> chain limiting just means it will trigger exceptions for the cases
> that would been insanely slow.

We can do better than saying "it would have been broken before, it's
broken differently now". The universal hash function idea has merit,
and for practical purposes hash randomization would fix this too
(since colliding data is only likely to collide within a single
process, persistent poisoning is far less feasible).

-Paul