[Python-Dev] Hash collision security issue (now public)

Alex Gaynor alex.gaynor at gmail.com
Thu Dec 29 02:51:21 CET 2011


A few thoughts on this:

a) This is not a new issue, I'm curious what the new interest is in it.

b) Whatever the solution to this is, it is *not* CPython specific, any decision
should be reflected in the Python language spec IMO, if CPython has the semantic
that dicts aren't vulnerable to hash collision then users *will* rely on this
and another implementation having a different (valid) behavior opens up users to
security issues.

c) I'm not convinced a randomized hash is appropriate for the default dict, for
a number of reasons: it's a performance hit on every dict operations, using a
per-process seed means you can't compile the hash of an obj at Python's compile
time, a per-dict seed inhibits a bunch of other optimizations.  These may not be
relevant to CPython, but they are to PyPy and probably the invoke-dynamic work
on Jython (pursuant to point b).

Therefore I think these should be considered application issues, since request
limiting is difficult and error prone, I'd recommend the Python stdlib including
a non-hash based map (such as a binary tree).

Alex



More information about the Python-Dev mailing list