[issue13703] Hash collision security issue

Martin v. Löwis report at bugs.python.org
Fri Jan 27 03:26:28 CET 2012


Martin v. Löwis <martin at v.loewis.de> added the comment:

> If I your read patch correctly, collisions will produce additional
> allocations of one distinct PyObject (i.e. AVL node) per colliding key.

That's correct.

> That's a pretty massive change in memory consumption for string dicts
> (and also in memory fragmentation and cache friendliness, probably).

That's not correct. It's not a massive change, as colliding hash values
never happen in practice, unless you are being attacked, in which case it
will be one additional PyObject for the set of all colliding keys (i.e.
one object per possible hundreds of string objects). Even including the
nodes of the tree (one per colliding node) is IMO a moderate increase
in memory usage, in order to solve the vulnerability.

It also doesn't impact memory fragmentation badly, as these objects
are allocated using the Python small objects allocator.

> The
> performance effect in most situations is likely to be negative too,
> despite the better worst-case complexity.

Compared to the status quo? Hardly. In all practical applications,
collision never happens, so none of the extra code is ever exexcuted -
except for AVL_Check invocations, which are a plain pointer
comparison.

> IMO that would be a rather controversial change for a feature release,
> let alone a bugfix or security release.

Apparently so, but it's not clear to me why that is. That change meets
all criteria of a security fix release nicely, as opposed to the proposed
changes to the hash function, which break existing code.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________


More information about the Python-bugs-list mailing list