[Python-Dev] Hash collision security issue (now public)

Wed Jan 4 08:59:15 CET 2012

On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen <janssen at parc.com> wrote:
> Christian Heimes <lists at cheimes.de> wrote:
>
>> Am 29.12.2011 12:13, schrieb Mark Shannon:
>> > The attack relies on being able to predict the hash value for a given
>> > string. Randomising the string hash function is quite straightforward.
>> > There is no need to change the dictionary code.
>> >
>> > A possible (*untested*) patch is attached. I'll leave it for those more
>> > familiar with unicodeobject.c to do properly.
>>
>> I'm worried that hash randomization of str is going to break 3rd party
>> software that rely on a stable hash across multiple Python instances.
>> Persistence layers like ZODB and cross interpreter communication
>> channels used by multiprocessing may (!) rely on the fact that the hash
>> of a string is fixed.
>
> Software that depends on an undefined hash function for synchronization
> and persistence deserves to break, IMO.  There are plenty of
> well-defined hash functions available for this purpose.
>
> Bill
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

A lot of software will break their tests, because dict ordering would
depend on the particular run. I know, because some of them break on
pypy which has a different dict ordering. This is probably a good
thing in general, but is it really worth it? People will install
python 2.6.newest and stuff *will* break.

Is it *really* a security issue? We knew all along that dicts are
O(n^2) in worst case scenario, how is this suddenly a security
problem?

Cheers,
fijal