[Python-Dev] Hash randomization for which types?

Steven D'Aprano steve at pearwood.info
Tue Feb 16 20:54:45 EST 2016


On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
> On 2/16/2016 1:48 AM, Christoph Groth wrote:
> >Hello,
> >
> >Recent Python versions randomize the hashes of str, bytes and datetime 
> >objects.  I suppose that the choice of these three types is the result 
> >of a compromise.  Has this been discussed somewhere publicly?
> 
> Search archives of this list... it was discussed at length.

There's a lot of discussion on the mailing list. I think that this is 
the very start of it, in Dec 2011:

https://mail.python.org/pipermail/python-dev/2011-December/115116.html

and continuing into 2012, for example:

https://mail.python.org/pipermail/python-dev/2012-January/115577.html
https://mail.python.org/pipermail/python-dev/2012-January/115690.html

and a LOT more, spread over many different threads and subject lines.

You should also read the issue on the bug tracker:

http://bugs.python.org/issue13703


My recollection is that it was decided that only strings and bytes need 
to have their hashes randomized, because only strings and bytes can be 
used directly from user-input without first having a conversion step 
with likely input range validation. In addition, changing the hash for 
ints would break too much code for too little benefit: unlike strings, 
where hash collision attacks on web apps are proven and easy, hash 
collision attacks based on ints are more difficult and rare.

See also the comment here:

http://bugs.python.org/issue13703#msg151847



> >I'm not a web programmer, but don't web applications also use 
> >dictionaries that are indexed by, say, tuples of integers?
> 
> Sure, and that is the biggest part of the reason they were randomized.  

But they aren't, as far as I can see:

[steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
1071302475
[steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
1071302475

Web apps can use dicts indexed by anything that they like, but unless 
there is an actual attack, what does it matter? Guido makes a good point 
about security here:

https://mail.python.org/pipermail/python-dev/2013-October/129181.html



> I think hashes of all types have been randomized, not _just_ the list 
> you mentioned.

I'm pretty sure that's not actually the case. Using 3.6 from the repo 
(admittedly not fully up to date though), I can see hash randomization 
working for strings:

[steve at ando 3.6]$ ./python -c "print(hash('abc'))"
11601873
[steve at ando 3.6]$ ./python -c "print(hash('abc'))"
-2009889747

but not for ints:

[steve at ando 3.6]$ ./python -c "print(hash(42))"
42
[steve at ando 3.6]$ ./python -c "print(hash(42))"
42


which agrees with my recollection that only strings and bytes would be 
randomized.



-- 
Steve


More information about the Python-Dev mailing list