[Python-Dev] Hash collision security issue (now public)

PJ Eby pje at telecommunity.com
Sat Dec 31 19:04:28 CET 2011

On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> While the dictionary probe has to start with a hash for backward
> compatibility reasons, is there a reason the overflow strategy for
> insertion has to be buckets containing lists?  How about
> double-hashing, etc?

This won't help, because the keys still have the same hash value. ANYTHING
you do to them after they're generated will result in them still colliding.

The *only* thing that works is to change the hash function in such a way
that the strings end up with different hashes in the first place.
 Otherwise, you'll still end up with (deliberate) collisions.

(Well, technically, you could use trees or some other O log n data
structure as a fallback once you have too many collisions, for some value
of "too many".  Seems a bit wasteful for the purpose, though.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111231/72554d42/attachment.html>

More information about the Python-Dev mailing list