[Python-Dev] Hash collision security issue (now public)
Jim Jewett
jimjjewett at gmail.com
Sat Dec 31 02:04:39 CET 2011
In http://mail.python.org/pipermail/python-dev/2011-December/115138.html,
Christian Heimes
pointed out that
> ... we don't have to alter the outcome of hash ... We just need to reduce the chance that
> an attacker can produce collisions in the dict (and set?)
I'll state it more strongly. hash probably should not change (at
least for this), but we may
want to consider a different conflict resolution strategy when the
first slot is already filled.
Remember that there was a fair amount of thought and timing effort put
into selecting the
current strategy; it is deliberately sub-optimal for random input, in
order to do better with
typical input. <
http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt >
If there is a change, it would currently be needed in three places for
each of set and dict
(the lookdict functions and insertdict_clean). It may be worth adding
some macros just to
keep those six in sync. Once those macros are in place, that allows a
compile-time switch.
My personal opinion is that accepting *and parsing* enough data for
this to be a problem
is enough of an edge case that I don't want normal dicts slowed down
at all for this; I would
therefore prefer that the change be restricted to such a compile-time
switch, with current
behavior the default.
http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictobject.c#l571
583 for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) {
584 i = (i << 2) + i + perturb + 1;
PERTURB_SHIFT is already a private #define to 5; per dictnotes, 4 and 6 perform
almost as well. Someone worried can easily make that change today,
and be protected
from "generic" anti-python attacks.
I believe the salt suggestions have equivalent to replacing
perturb = hash;
with something like perturb = hash + salt;
Changing i = (i << 2) + i + perturb + 1; would allow
effectively replacing the initial hash,
but risks spoiling performance in the non-adversary case.
Would there be objections to replacing those two lines with something like:
for (perturb = FIRST_PERTURB(hash, key);
ep->me_key != NULL;
perturb = NEXT_PERTURB(hash, key, perturb)) {
i = NEXT_SLOT(i, perturb);
The default macro definitions should keep things as they are
#define FIRST_PERTURB(hash, key) hash
#define NEXT_PERTURB(hash, key, perturb) perturb >> PERTURB_SHIFT
#define NEXT_SLOT(i, perturb) (i << 2) + i + perturb + 1
while allowing #ifdefs for (slower but) safer things like adding a
salt, or even using
alternative hashes.
-jJ
More information about the Python-Dev
mailing list