[Python-Dev] Hash collision security issue (now public)

Mon Jan 2 15:32:19 CET 2012

Christian Heimes writes:
 > Am 31.12.2011 13:03, schrieb Stephen J. Turnbull:
 > > I don't know the implementation issues well enough to claim it is a
 > > solution, but this hasn't been mentioned before AFAICS:
 > > 
 > > While the dictionary probe has to start with a hash for backward
 > > compatibility reasons, is there a reason the overflow strategy for
 > > insertion has to be buckets containing lists?  How about
 > > double-hashing, etc?
 > 
 > Python's dict implementation doesn't use bucket but open addressing (aka
 > closed hashed table). The algorithm for conflict resolution doesn't use
 > double hashing. Instead it takes the original and (in most cases) cached
 > hash and perturbs the hash with a series of add, multiply and bit shift ops.

In an attack, this is still O(collisions) per probe (as any scheme
where the address of the nth collision is a function of only the
hash), where double hashing should be "roughly" O(1) (with double the
coefficient).

But that evidently imposes too large a performance burden on non-evil
users, so it's not worth thinking about whether "roughly O(1)" is
close enough to O(1) to deter or exhaust attackers.  I withdraw the
suggestion.