[issue13703] Hash collision security issue

Martin v. Löwis report at bugs.python.org
Thu Jan 26 23:34:52 CET 2012


Martin v. Löwis <martin at v.loewis.de> added the comment:

> as soon as any key insertion or lookup occurs where the key isn't
> exactly one of the correct types, the dict flattens any AVL trees back
> into the regular flat representation (and switches to lookdict for
> ma_lookup), analogous to the existing ma_lookup transition on dicts.

Correct.

> TREE_DELETE() invocations as appropriate; ultimately, the AVL macros
> call back to within node_cmp():
>    PyObject_Compare(left->key, right->key)

Correct.

> I suspect that you can't plug arbitrary user-defined types into it,
> since there's no way to guarantee that ordering and comparison work in
> the ways that the AVL lookup requires.

That's all true. It would be desirable to automatically determine which
types also support total order in addition to hashing, alas, there is
currently no protocol for it. On the contrary, Python has moved away
of assuming that all objects form a total order.

> [thinking aloud, if a pair of
> objects don't implement comparison at the PyObject_Compare level, is it
> possible to instead simply compare the addresses of the objects?

2.x has an elaborate logic to provide a total order on objects. It
took the entire 1.x and 2.x series to fix issues with that order, only
to recognize that it is not feasible to provide one - hence the introduction
of rich comparisons and the omission of cmp in 3.x.

For the dictionary, using addresses does not work: the order of objects
needs to be consistent with equality (i.e. x < y and x == y must not
simultaneously hold, as must x == y and y < z imply that also x < z,
else the tree lookup won't find the equal keys).

> Going higher-level, I feel that there are plenty of attacks against
> pure-str/bytes dicts, and having protection against them is worthwhile -
> even if there's no direct way to use it to protect the use-case you
> describe.

Indeed. This issue doesn't need to fix *all* possible attacks using
hash collisions. Instead, it needs to cover the common case, and it
needs to allow users to rewrite their code so that they can protect
it against this family of attacks.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________


More information about the Python-bugs-list mailing list