[issue13703] Hash collision security issue

Antoine Pitrou report at bugs.python.org
Sat Jan 21 23:20:48 CET 2012


Antoine Pitrou <pitrou at free.fr> added the comment:

> I wonder if the dict statistics should be exposed with extra attributes
> or a method on the dict; e.g. a __stats__ attribute, something like
> this:
> 
> LargeDictStats(keys=58, mask=127, insertions=53, iterations=1697)
> 
> SmallDictStats(keys=3, mask=7)

Sounds a bit overkill, and it shouldn't be a public API (which
__methods__ are). Even a private API on dicts would quickly become
visible, since dicts are so pervasive.

> > > Caveats:
> > > * no overflow handling (what happens after 2**32 modifications to a
> > > long-lived dict on a 32-bit build?) - though that's fixable.
> > 
> > How do you suggest to fix it?
> 
> If the dict is heading towards overflow of these counters, it's either
> long-lived, or *huge*.
> 
> Possible approaches:
> (a) use 64-bit counters rather than 32-bit, though that's simply
> delaying the inevitable

Well, even assuming one billion lookup probes per second on a single
dictionary, the inevitable will happen in 584 years with a 64-bit
counter (but only 4 seconds with a 32-bit counter).

A real issue, though, may be the cost of 64-bit arithmetic on 32-bit
CPUs.

> (b) when one of the counters gets large, divide both of them by a
> constant (e.g. 2).  We're interested in their ratio, so dividing both by
> a constant preserves this.

Sounds good, although we may want to pull this outside of the critical
loop.

> By "a constant" do you mean from the perspective of big-O notation, or
> do you mean that it should be hardcoded (I was wondering if it should be
> a sys variable/environment variable etc?).

Hardcoded, as in your patch.

> > There needs to be a way to disable it: an environment variable would be
> > the minimum IMO.
> 
> e.g. set it to 0 to enable it, set it to nonzero to set the scale
> factor.

0 to enable it sounds misleading. I'd say:
- 0 to disable it
- 1 to enable it and use the default scaling factor
- >= 2 to enable it and set the scaling factor

> Any idea what to call it? 

PYTHONDICTPROTECTION ?
Most people should either enable or disable it, not change the scaling
factor.

> BTW, presumably if we do it, we should do it for sets as well?

Yeah, and use the same env var / sys function.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________


More information about the Python-bugs-list mailing list