[issue13703] Hash collision security issue
Gregory P. Smith
report at bugs.python.org
Tue Jan 24 01:44:44 CET 2012
Gregory P. Smith <greg at krypto.org> added the comment:
On Mon, Jan 23, 2012 at 1:32 PM, Dave Malcolm <report at bugs.python.org> wrote:
> Dave Malcolm <dmalcolm at redhat.com> added the comment:
> I'm attaching an attempt at backporting haypo's random-8.patch to 2.7
> Changes relative to random-8.patch:
> * The randomization is off by default, and must be enabled by setting
> a new environment variable PYTHONHASHRANDOMIZATION to a non-empty string.
> (if so then, PYTHONHASHSEED also still works, if provided, in the same
> way as in haypo's patch)
> * All of the various "Py_hash_t" become "long" again (Py_hash_t was
> added in 3.2: issue9778)
> * I expanded the randomization from just PyUnicodeObject to also cover
> these types:
> * PyStringObject
> * PyBufferObject
> The randomization does not cover numeric types: if we change the hash of
> int so that hash(i) no longer equals i, we also have to change it
> consistently for long, float, complex, decimal.Decimal and
> fractions.Fraction; however, there are 3rd-party numeric types that
> have their own __hash__ implementation that mimics int.__hash__ (see
> e.g. numpy)
> As noted in http://bugs.python.org/issue13703#msg151063 and
> http://bugs.python.org/issue13703#msg151064, it's not possible
> to directly create a dict with integer keys via JSON or XML-RPC.
> This seems like a tradeoff between the risk of attack via other means
> vs breakage induced by not having hash() == hash() for the various
> equivalent numerical representations in pre-existing code.
Exactly. I would NOT worry about hash repeatability for integers and
complex data structures. It is not at the core of the common problem
(maybe a couple application specific problems but not a general "all
python web apps" severity problem).
Doing it for base byte string and unicode string like objects is
sufficient. Good catch on doing it for buffer objects, I'd forgotten
about those. ;) A big flaw with haypo's patch is that it only
considers unicode instead of all byte-string-ish stuff. (the code in
issue13704 does that better).
> * To support my expanded usage of the random secret, I moved:
> PyAPI_DATA(_Py_unicode_hash_secret_t) _Py_unicode_hash_secret
> from unicodeobject.h to object.h and renamed it to:
> PyAPI_DATA(_Py_HashSecret_t) _Py_HashSecret;
> This also exposes it for usage by C extension modules, just in case
> they need it (Murphy's Law suggests we will need if we don't expose
> it). This is an extension of the API, but warranted, I feel. My
> plan for downstream RHEL is to add this explicitly to the RPM metadata
> as a "Provides" of the RPM providing libpython.so so that if something
> needs to use it, it can express a "Requires" on it; I assume that
> something similar is possible with .deb)
Exposing this is good. There is a hash table implementation within
modules/expat/xmlparse.c that should probably use it as well.
> * generalized test_unicode.HashTest to support the new env var and the
> additional types. In my version, get_hash takes a _repr string rather
> than an object, so that I can test it with a buffer(). Arguably the
> tests should thus be moved from test_unicode to somewhere else, but this
> location keeps things consistent with haypo's patch.
> haypo: in random-8.patch, within test_unicode.HashTest.test_null_hash,
> "hash_empty" seems to be misnamed
Lets move this to a better location in all patches. At this point
haypo's patch is not done yet so relevant bits of what you are doing
here is likely to be fed back into the eventual 3.3 tip patch.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list