[Python-Dev] Hash randomization for which types?

Christoph Groth christoph at grothesque.org
Wed Feb 17 09:51:50 EST 2016


Steven D'Aprano wrote:
> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
>> On 2/16/2016 1:48 AM, Christoph Groth wrote:
>> >Recent Python versions randomize the hashes of str, bytes and datetime 
>> >objects.  I suppose that the choice of these three types is the result 
>> >of a compromise.  Has this been discussed somewhere publicly?
>> 
>> Search archives of this list... it was discussed at length.
>
> There's a lot of discussion on the mailing list. I think that this is 
> the very start of it, in Dec 2011:
> (...)

I tried searching myself for an hour or so, but though I found many
discussions, I didn't see any discussion about whether hashes of other
types should be randomized as well.  The relevant PEP also doesn't touch
this issue.

> My recollection is that it was decided that only strings and bytes need 
> to have their hashes randomized, because only strings and bytes can be 
> used directly from user-input without first having a conversion step 
> with likely input range validation. In addition, changing the hash for 
> ints would break too much code for too little benefit: unlike strings, 
> where hash collision attacks on web apps are proven and easy, hash 
> collision attacks based on ints are more difficult and rare.
>
> See also the comment here:
>
> http://bugs.python.org/issue13703#msg151847

Perfect, that's exactly what I was looking for.  I am reassured that
this has been thought through.  Thanks a lot!

Christoph



More information about the Python-Dev mailing list