[Python-Dev] Status of the fix for the hash collision vulnerability

Stefan Behnel stefan_ml at behnel.de
Sun Jan 15 17:46:36 CET 2012


Guido van Rossum, 15.01.2012 17:10:
> On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> Terry Reedy, 14.01.2012 06:43:
>>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>>>
>>>> It is perfectly okay to break existing users who had anything depending
>>>> on ordering of internal hash tables. Their code was already broken.
>>>
>>> Given that the doc says "Return the hash value of the object", I do not
>>> think we should be so hard-nosed. The above clearly implies that there is
>>> such a thing as *the* Python hash value for an object. And indeed, that
>> has
>>> been true across many versions. If we had written "Return a hash value
>> for
>>> the object, which can vary from run to run", the case would be different.
>>
>> Just a side note, but I don't think hash() is the right place to document
>> this.
> 
> You mean we shouldn't document that the hash() of a string will vary per
> run?

No, I mean that the hash() builtin function is not the right place to
document the behaviour of a string hash. That should go into the string
object documentation.

Although, arguably, it may be worth mentioning in the docs of hash() that,
in general, hash values of builtin types are bound to the lifetime of the
interpreter instance (or entire runtime?) and may change after restarts. I
think that's a reasonable restriction to document that prominently, even if
it will only apply to str for the time being.


>> Hashing is a protocol in Python, just like indexing or iteration.
>> Nothing keeps an object from changing its hash value due to modification,
> 
> Eh? There's a huge body of cultural awareness that only immutable objects
> should define a hash, implying that the hash remains constant during the
> object's lifetime.
> 
>> and that would even be valid in the face of the usual dict lookup
>> invariants if changes are only applied while the object is not referenced
>> by any dict.
> 
> And how would you know it isn't?

Well, if it's an object with a mutable hash then it's up to the application
defining that object to make sure it's used in a sensible way. Immutability
just makes your life easier. I can imagine that an object gets removed from
a dict (say, a cache), modified and then reinserted, and I think it's valid
to allow the modification to have an impact on the hash in this case, in
order to accommodate for any changes to equality comparisons due to the
modification.

That being said, it seems that the Python docs actually consider constant
hashes a requirement rather than a virtue.

http://docs.python.org/glossary.html#term-hashable

"""
An object is hashable if it has a hash value which never changes during its
lifetime (it needs a __hash__() method), and can be compared to other
objects (it needs an __eq__() or __cmp__() method). Hashable objects which
compare equal must have the same hash value.
"""

It also seems to me that the wording "has a hash value which never changes
during its lifetime" makes it pretty clear that the lifetime of the hash
value is not guaranteed to supersede the lifetime of the object (although
that's a rather muddy definition - memory lifetime? or pickle-unpickle as
well?).

However, this entry in the glossary only seems to have appeared with Py2.6,
likely as a result of the abc changes. So it won't help in defending a
change to the hash function.


>> So the guarantees do not depend on the function hash() and may
>> be even weaker than your above statement.
> 
> There are no actual guarantees for hash(), but lots of rules for
> well-behaved hashes.

Absolutely.

Stefan



More information about the Python-Dev mailing list