[Python-Dev] Counting collisions for the win

Mon Jan 23 22:55:47 CET 2012

Frank Sievertsen wrote:
> Hello,
> 
> I'd still prefer to see a randomized hash()-function (at least for 3.3).
> 
> But to protect against the attacks it would be sufficient to use
> randomization for collision resolution in dicts (and sets).
> 
> What if we use a second (randomized) hash-function in case there
> are many collisions in ONE lookup. This hash-function is used only
> for collision resolution and is not cached.

This sounds a lot like what I'm referring to as universal hash function
in the discussion on the ticket:

http://bugs.python.org/issue13703#msg150724
http://bugs.python.org/issue13703#msg150795
http://bugs.python.org/issue13703#msg151813

However, I don't like the term "random" in there. It's better to make
the approach deterministic to avoid issues with not being able
to easily reproduce Python application runs for debugging purposes.

If you find that the data is manipulated, simply incrementing the
universal hash parameter and rehashing the dict with that parameter
should be enough to solve the issue (if not, which is highly unlikely,
the dict will simply reapply the fix). No randomness needed.

BTW: I attached a demo script to the ticket which demonstrates both
types of collisions using integers.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 23 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/