[Python-Dev] Hash randomization for which types?

Shell Xu shell909090 at gmail.com
Tue Feb 16 22:45:57 EST 2016

I thought you are right. Here is the source code in python 2.7.11:

PyObject_Hash(PyObject *v)
    PyTypeObject *tp = v->ob_type;
    if (tp->tp_hash != NULL)
        return (*tp->tp_hash)(v);
    /* To keep to the general practice that inheriting
     * solely from object in C code should work without
     * an explicit call to PyType_Ready, we implicitly call
     * PyType_Ready here and then check the tp_hash slot again
    if (tp->tp_dict == NULL) {
        if (PyType_Ready(tp) < 0)
            return -1;
        if (tp->tp_hash != NULL)
            return (*tp->tp_hash)(v);
    if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) {
        return _Py_HashPointer(v); /* Use address as hash value */
    /* If there's a cmp but no hash defined, the object can't be hashed */
    return PyObject_HashNotImplemented(v);

If object has hash function, it will be used. If not, _Py_HashPointer will
be used. Which _Py_HashSecret are not used.
And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject
and stringobject use _Py_HashSecret.

On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano <steve at pearwood.info>

> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
> > On 2/16/2016 1:48 AM, Christoph Groth wrote:
> > >Hello,
> > >
> > >Recent Python versions randomize the hashes of str, bytes and datetime
> > >objects.  I suppose that the choice of these three types is the result
> > >of a compromise.  Has this been discussed somewhere publicly?
> >
> > Search archives of this list... it was discussed at length.
> There's a lot of discussion on the mailing list. I think that this is
> the very start of it, in Dec 2011:
> https://mail.python.org/pipermail/python-dev/2011-December/115116.html
> and continuing into 2012, for example:
> https://mail.python.org/pipermail/python-dev/2012-January/115577.html
> https://mail.python.org/pipermail/python-dev/2012-January/115690.html
> and a LOT more, spread over many different threads and subject lines.
> You should also read the issue on the bug tracker:
> http://bugs.python.org/issue13703
> My recollection is that it was decided that only strings and bytes need
> to have their hashes randomized, because only strings and bytes can be
> used directly from user-input without first having a conversion step
> with likely input range validation. In addition, changing the hash for
> ints would break too much code for too little benefit: unlike strings,
> where hash collision attacks on web apps are proven and easy, hash
> collision attacks based on ints are more difficult and rare.
> See also the comment here:
> http://bugs.python.org/issue13703#msg151847
> > >I'm not a web programmer, but don't web applications also use
> > >dictionaries that are indexed by, say, tuples of integers?
> >
> > Sure, and that is the biggest part of the reason they were randomized.
> But they aren't, as far as I can see:
> [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
> 1071302475
> [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
> 1071302475
> Web apps can use dicts indexed by anything that they like, but unless
> there is an actual attack, what does it matter? Guido makes a good point
> about security here:
> https://mail.python.org/pipermail/python-dev/2013-October/129181.html
> > I think hashes of all types have been randomized, not _just_ the list
> > you mentioned.
> I'm pretty sure that's not actually the case. Using 3.6 from the repo
> (admittedly not fully up to date though), I can see hash randomization
> working for strings:
> [steve at ando 3.6]$ ./python -c "print(hash('abc'))"
> 11601873
> [steve at ando 3.6]$ ./python -c "print(hash('abc'))"
> -2009889747
> but not for ints:
> [steve at ando 3.6]$ ./python -c "print(hash(42))"
> 42
> [steve at ando 3.6]$ ./python -c "print(hash(42))"
> 42
> which agrees with my recollection that only strings and bytes would be
> randomized.
> --
> Steve
