[Python-Dev] Hash collision security issue (now public)

Jim Jewett jimjjewett at gmail.com
Mon Jan 2 00:28:02 CET 2012

Steven D'Aprano (in

> By compile-time, do you mean when the byte-code is compilated, i.e. just
> before runtime, rather than a switch when compiling the Python executable from
> source?

No.  I really mean when the C code is initially compiled to produce an
python executable.

The only reason we're worrying about this is that an adversary may
force worst-case performance.  If the python instance isn't a server,
or at least isn't exposed to untrusted clients, then even a single
extra "if" test is unjustified overhead.  Adding overhead to every
string hash or every dict lookup is bad.

That said, adding some overhead (only) to dict lookups *that already
hit half a dozen consecutive collisions* probably is reasonable,
because that won't happen very often with normal data.  (6 collisions
can't happen at all unless there are already at least 6 entries, so
small dicts are safe; with at least 1/3 of the slots empty, it should
happen only 1/729 for worst-size larger dicts.)


More information about the Python-Dev mailing list