[Python-Dev] Hash collision security issue (now public)
Christian Heimes
lists at cheimes.de
Sat Dec 31 04:24:15 CET 2011
Am 31.12.2011 03:31, schrieb Victor Stinner:
> Le 29/12/2011 14:19, Christian Heimes a écrit :
>> Perhaps the dict code is a better place for randomization.
>
> The problem is the creation of a dict with keys all having the same hash
> value. The current implementation of dict uses a linked-list. Adding a
> new item requires to compare the new key to all existing keys (compare
> the value, not the hash, which is much slower).
>
> We had to change completly how dict is implemented to be able to fix
> this issue. I don't think that we can change the dict implementation
> without breaking backward compatibility or breaking applications. Change
> the implementation would change dict properties, and applications rely
> on the properties of the current implementation.
>
> Tell me if I am wrong.
You are right and I was wrong. We can't do the randomization inside the
dict code. The randomization factor must used as initialization factor
(IV) for the hashing algorithm. At first I thought the attack used the
unique property of Python's dict implementation (perturbed hash instead
of buckets for equal hashes) but I was wrong. It took me several hours
of reading and digging into the math until I figured out my mistake.
Sorry! :)
This means we can't implement a salted dict unless the salted dict
re-implemention the hash algorithm for unicode, bytes and memoryview. I
doubt that a wise idea.
I've checked my first draft of a possible solution:
http://hg.python.org/features/randomhash/ . The pseudo RNG has to be
replaced with something better and it's missing an option to feed a
seed, too.
Christian
More information about the Python-Dev
mailing list