[Python-Dev] Hash collision security issue (now public)

Sat Dec 31 04:24:15 CET 2011

Am 31.12.2011 03:31, schrieb Victor Stinner:
> Le 29/12/2011 14:19, Christian Heimes a écrit :
>> Perhaps the dict code is a better place for randomization.
> 
> The problem is the creation of a dict with keys all having the same hash 
> value. The current implementation of dict uses a linked-list. Adding a 
> new item requires to compare the new key to all existing keys (compare 
> the value, not the hash, which is much slower).
> 
> We had to change completly how dict is implemented to be able to fix 
> this issue. I don't think that we can change the dict implementation 
> without breaking backward compatibility or breaking applications. Change 
> the implementation would change dict properties, and applications rely 
> on the properties of the current implementation.
> 
> Tell me if I am wrong.

You are right and I was wrong. We can't do the randomization inside the
dict code. The randomization factor must used as initialization factor
(IV) for the hashing algorithm. At first I thought the attack used the
unique property of Python's dict implementation (perturbed hash instead
of buckets for equal hashes) but I was wrong. It took me several hours
of reading and digging into the math until I figured out my mistake.
Sorry! :)

This means we can't implement a salted dict unless the salted dict
re-implemention the hash algorithm for unicode, bytes and memoryview. I
doubt that a wise idea.

I've checked my first draft of a possible solution:
http://hg.python.org/features/randomhash/ . The pseudo RNG has to be
replaced with something better and it's missing an option to feed a
seed, too.

Christian