Re: [Python-Dev] Hash collision security issue (now public)

31 Dec 2011

      Am 31.12.2011 03:31, schrieb Victor Stinner:
...
Le 29/12/2011 14:19, Christian Heimes a écrit :
...
Perhaps the dict code is a better place for randomization.
The problem is the creation of a dict with keys all having the same hash 
value. The current implementation of dict uses a linked-list. Adding a 
new item requires to compare the new key to all existing keys (compare 
the value, not the hash, which is much slower).
We had to change completly how dict is implemented to be able to fix 
this issue. I don't think that we can change the dict implementation 
without breaking backward compatibility or breaking applications. Change 
the implementation would change dict properties, and applications rely 
on the properties of the current implementation.
Tell me if I am wrong.
You are right and I was wrong. We can't do the randomization inside the
dict code. The randomization factor must used as initialization factor
(IV) for the hashing algorithm. At first I thought the attack used the
unique property of Python's dict implementation (perturbed hash instead
of buckets for equal hashes) but I was wrong. It took me several hours
of reading and digging into the math until I figured out my mistake.
Sorry! :)

This means we can't implement a salted dict unless the salted dict
re-implemention the hash algorithm for unicode, bytes and memoryview. I
doubt that a wise idea.

I've checked my first draft of a possible solution:
http://hg.python.org/features/randomhash/ . The pseudo RNG has to be
replaced with something better and it's missing an option to feed a
seed, too.

Christian