[issue13703] Hash collision security issue

Tue Jan 24 01:14:31 CET 2012

Paul McMillan <paul at mcmillan.ws> added the comment:

> I think you're asking a bit much here :-) A broken app is a broken
> app, no matter how nice Python tries to work around it. If an
> app puts too much trust into user data, it will be vulnerable
> one way or another and regardless of how the user data enters
> the app.

I notice your patch doesn't include fixes for the entire standard
library to work around this problem. Were you planning on writing
those, or leaving that for others?

As a developer, I honestly don't know how I can state with certainty
that input data is clean or not, until I actually see the error you
propose. I can't check validity before the fact, the way I can check
for invalid unicode before storing it in my database. Once I see the
error (probably only after my application is attacked, certainly not
during development), it's too late. My application can't know which
particular data triggered the error, so it can't delete it. I'm
reduced to trial and error to remove the offending data, or to writing
code that never stores more than 1000 things in a dictionary. And I
have to accept that the standard library may not work on any
particular data I want to process, and must write code that detects
the error state and somehow magically removes the offending data.

The alternative, randomization, simply means that my dictionary
ordering is not stable, something that is already the case.

While I appreciate that the counting approach feels cleaner;
randomization is the only solution that makes practical sense.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________