[Python-Dev] Counting collisions for the win

Fri Jan 20 04:48:16 CET 2012

On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik <ivan at ludios.org> wrote:

> On Fri, Jan 20, 2012 at 00:48, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
> > I propose to solve the hash collision vulnerability by counting
> > collisions because it does fix the vulnerability with a minor or no
> > impact on applications or backward compatibility. I don't see why we
> > should use a different fix for Python 3.3. If counting collisons
> > solves the issue for stable versions, it is also enough for Python
> > 3.3. We now know all issues of the randomized hash solution, and I
> > think that there are more drawbacks than advantages. IMO the
> > randomized hash is overkill to fix the hash collision issue.
>
> I'd like to point out that an attacker is not limited to sending just
> one dict full of colliding keys.  Given a 22ms stall for a dict full
> of 1000 colliding keys, and 100 such objects inside a parent object
> (perhaps JSON), you can stall a server for 2.2+ seconds.  Going with
> the raise-at-1000 approach doesn't solve the problem for everyone.
>

It's "just" a DoS attack. Those won't go away. We just need to raise the
effort needed for the attacker. The original attack would cause something
like 5 minutes of CPU usage per request (with a set of colliding keys that
could be computed once and used to attack every Python-run website in the
world). That's at least 2 orders of magnitude worse.

In addition, because the raise-at-N-collisions approach raises an
> exception, everyone who wants to handle this error condition properly
> has to change their code to catch a previously-unexpected exception.
> (I know they're usually still better off with the fix, but why force
> many people to change code when you can actually fix the hashing
> problem?)
>

Why would anybody need to change their code? Every web framework worth its
salt has a top-level error catcher that logs the error, serves a 500
response, and possibly does other things like email the admin.

> Another issue is that even with a configurable limit, different
> modules can't have their own limits.  One module might want a
> relatively safe raise-at-100, and another module creating massive
> dicts might want raise-at-1000.  How does a developer know whether
> they can raise or lower the limit, given that they use a bunch of
> different modules?
>

I don't think it needs to be configurable. There just needs to be a way to
turn it off.

> I actually went with this stop-at-N-collisions approach by patching my
> CPython a few years ago, where I limiting dictobject and setobject's
> critical `for` loop to 100 iterations (I realize this might handle
> fewer than 100 collisions.)  This worked fine until I tried to compile
> PyPy, where the translator blew up due to a massive dict.

I think that's because your collision-counting algorithm was much more
primitive than MAL's.

> This,
> combined with the second problem (needing to catch an exception), led
> me to abandon this approach and write Securetypes, which has a
> securedict that uses SHA-1.  Not that I like this either; I think I'm
> happy with the randomize-hash() approach.
>

Why did you need to catch the exception? Were you not happy with the
program simply terminating with a traceback when it got attacked?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120119/887a6a6b/attachment.html>