[Python-Dev] Hash collision security issue (now public)

Thu Dec 29 21:07:59 CET 2011

On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes <lists at cheimes.de> wrote:

> IMHO we don't have to alter the outcome of hash("some string"), hash(1)
> and all other related types. We just need to reduce the change the an
> attacker can produce collisions in the dict (and set?) code that looks
> up the slot (PyDictEntry). How about adding the random value in
> Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
> lookdict_unicode() (Python 3.x)? With this approach the hash of all our
> objects stay the same and just the dict code needs to be altered.

I don't understand how that helps a collision attack.  If you can still
generate two strings with the same (pre-randomized) hash, what difference
does it make that the dict adds a random number?  The post-randomized
number will still be the same, no?

Or does this attack just rely on the hash *remainders* being the same?  If
so, I can see how hashing the hash would help.  But since the attacker
doesn't know the modulus, and it can change as the dictionary grows, I
would expect the attack to require matching hashes, not just matching hash
remainders...  unless I'm just completely off base here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111229/1d848209/attachment.html>