<div class="gmail_quote">On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik <span dir="ltr"><<a href="mailto:ivan@ludios.org">ivan@ludios.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Fri, Jan 20, 2012 at 00:48, Victor Stinner<br>
<<a href="mailto:victor.stinner@haypocalc.com">victor.stinner@haypocalc.com</a>> wrote:<br>
> I propose to solve the hash collision vulnerability by counting<br>
> collisions because it does fix the vulnerability with a minor or no<br>
> impact on applications or backward compatibility. I don't see why we<br>
> should use a different fix for Python 3.3. If counting collisons<br>
> solves the issue for stable versions, it is also enough for Python<br>
> 3.3. We now know all issues of the randomized hash solution, and I<br>
> think that there are more drawbacks than advantages. IMO the<br>
> randomized hash is overkill to fix the hash collision issue.<br>
<br>
</div>I'd like to point out that an attacker is not limited to sending just<br>
one dict full of colliding keys. Given a 22ms stall for a dict full<br>
of 1000 colliding keys, and 100 such objects inside a parent object<br>
(perhaps JSON), you can stall a server for 2.2+ seconds. Going with<br>
the raise-at-1000 approach doesn't solve the problem for everyone.<br></blockquote><div><br></div><div>It's "just" a DoS attack. Those won't go away. We just need to raise the effort needed for the attacker. The original attack would cause something like 5 minutes of CPU usage per request (with a set of colliding keys that could be computed once and used to attack every Python-run website in the world). That's at least 2 orders of magnitude worse.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
In addition, because the raise-at-N-collisions approach raises an<br>
exception, everyone who wants to handle this error condition properly<br>
has to change their code to catch a previously-unexpected exception.<br>
(I know they're usually still better off with the fix, but why force<br>
many people to change code when you can actually fix the hashing<br>
problem?)<br></blockquote><div><br></div><div>Why would anybody need to change their code? Every web framework worth its salt has a top-level error catcher that logs the error, serves a 500 response, and possibly does other things like email the admin.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Another issue is that even with a configurable limit, different<br>
modules can't have their own limits. One module might want a<br>
relatively safe raise-at-100, and another module creating massive<br>
dicts might want raise-at-1000. How does a developer know whether<br>
they can raise or lower the limit, given that they use a bunch of<br>
different modules?<br></blockquote><div><br></div><div>I don't think it needs to be configurable. There just needs to be a way to turn it off.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I actually went with this stop-at-N-collisions approach by patching my<br>
CPython a few years ago, where I limiting dictobject and setobject's<br>
critical `for` loop to 100 iterations (I realize this might handle<br>
fewer than 100 collisions.) This worked fine until I tried to compile<br>
PyPy, where the translator blew up due to a massive dict.</blockquote><div><br></div><div>I think that's because your collision-counting algorithm was much more primitive than MAL's.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
This,<br>
combined with the second problem (needing to catch an exception), led<br>
me to abandon this approach and write Securetypes, which has a<br>
securedict that uses SHA-1. Not that I like this either; I think I'm<br>
happy with the randomize-hash() approach.<font color="#888888"><br></font></blockquote></div><div><br></div><div>Why did you need to catch the exception? Were you not happy with the program simply terminating with a traceback when it got attacked?</div>
<div><br></div>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>