[Python-Dev] Status of the fix for the hash collision vulnerability

Sat Jan 14 04:06:00 CET 2012

On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at krypto.org> wrote:

>
> On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum <guido at python.org>wrote:
>
>> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at pitrou.net>wrote:
>>
>>> On Thu, 12 Jan 2012 18:57:42 -0800
>>> Guido van Rossum <guido at python.org> wrote:
>>> > Hm... I started out as a big fan of the randomized hash, but thinking
>>> more
>>> > about it, I actually believe that the chances of some legitimate app
>>> having
>>> > >1000 collisions are way smaller than the chances that somebody's code
>>> will
>>> > break due to the variable hashing.
>>>
>>> Breaking due to variable hashing is deterministic: you notice it as
>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>>> variable hashing). That seems better than unpredictable breaking when
>>> some legitimate collision chain happens.
>>
>>
>> Fair enough. But I'm now uncomfortable with turning this on for bugfix
>> releases. I'm fine with making this the default in 3.3, just not in 3.2,
>> 3.1 or 2.x -- it will break too much code and organizations will have to
>> roll back the release or do extensive testing before installing a bugfix
>> release -- exactly what we *don't* want for those.
>>
>> FWIW, I don't believe in the SafeDict solution -- you never know which
>> dicts you have to change.
>>
>>
> Agreed.
>
> Of the three options Victor listed only one is good.
>
> I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to
> always get everything right with regards to data that came from outside the
> process never ending up hashed in a non-safe dict or set *anywhere*.
>  "Safe" needs to be the default option for all hash tables.
>
> I don't like the "*too many hash collisions*" exception. *-1*. It
> provides non-deterministic application behavior for data driven
> applications with no way for them to predict when it'll happen or where and
> prepare for it. It may work in practice for many applications but is simply
> odd behavior.
>
> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily
> be back ported to any Python version.
>
> It is perfectly okay to break existing users who had anything depending on
> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
> feature off at their own peril which they can use in their test harnesses
> that are stupid enough to use doctests with order dependencies.
>

What an implementation looks like:

 http://pastebin.com/9ydETTag

some stuff to be filled in, but this is all that is really required.  add
logic to allow a particular seed to be specified or forced to 0 from the
command line or environment.  add the logic to grab random bytes.  add the
autoconf glue to disable it.  done.

-gps

> This approach worked fine for Perl 9 years ago.
> https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371
>
> -gps
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120113/3fb82673/attachment.html>