[Python-Dev] Status of the fix for the hash collision vulnerability
Hrvoje Niksic
hrvoje.niksic at avl.com
Wed Jan 18 11:15:49 CET 2012
On 01/17/2012 09:29 PM, "Martin v. Löwis" wrote:
> I(0) = H& MASK
> PERTURB(0) = H
> I(n+1) = (5*I(n) + 1 + PERTURB(n))& MASK
> PERTURN(n+1) = PERTURB(n)>> 5
>
> So if two objects O1 and O2 have the same hash value H, the sequence of
> probed indices is the same for any MASK value. It will be a different
> sequence, yes, but they will still collide on each and every slot.
>
> This is the very nature of open addressing.
Open addressing can still deploy a collision resolution mechanism
without this property. For example, double hashing uses a different hash
function (applied to the key) to calculate PERTURB(0). To defeat it, the
attacker would have to produce keys that hash the same using both hash
functions.
Double hashing is not a good general solution for Python dicts because
it complicates the interface of hash tables that support arbitrary keys.
Still, it could be considered for dicts with known key types (built-ins
could hardcode the alternative hash function) or for SafeDicts, if they
are still considered.
Hrvoje
More information about the Python-Dev
mailing list