[Python-Dev] Status of the fix for the hash collision vulnerability

Gregory P. Smith greg at krypto.org
Mon Jan 16 21:16:38 CET 2012

On Sun, Jan 15, 2012 at 9:44 AM, Guido van Rossum <guido at python.org> wrote:

> On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan_ml at behnel.de>wrote:
>> Guido van Rossum, 15.01.2012 17:10:
>> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> >> Terry Reedy, 14.01.2012 06:43:
>> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>> >>>
>> >>>> It is perfectly okay to break existing users who had anything
>> depending
>> >>>> on ordering of internal hash tables. Their code was already broken.
>> >>>
>> >>> Given that the doc says "Return the hash value of the object", I do
>> not
>> >>> think we should be so hard-nosed. The above clearly implies that
>> there is
>> >>> such a thing as *the* Python hash value for an object. And indeed,
>> that
>> >> has
>> >>> been true across many versions. If we had written "Return a hash value
>> >> for
>> >>> the object, which can vary from run to run", the case would be
>> different.
>> >>
>> >> Just a side note, but I don't think hash() is the right place to
>> document
>> >> this.
>> >
>> > You mean we shouldn't document that the hash() of a string will vary per
>> > run?
>> No, I mean that the hash() builtin function is not the right place to
>> document the behaviour of a string hash. That should go into the string
>> object documentation.
>> Although, arguably, it may be worth mentioning in the docs of hash() that,
>> in general, hash values of builtin types are bound to the lifetime of the
>> interpreter instance (or entire runtime?) and may change after restarts. I
>> think that's a reasonable restriction to document that prominently, even
>> if
>> it will only apply to str for the time being.
> Actually it will apply to a lot more than str, because the hash of
> (immutable) compound objects is often derived from the hash of the
> constituents, e.g. hash of a tuple.
>> >> Hashing is a protocol in Python, just like indexing or iteration.
>> >> Nothing keeps an object from changing its hash value due to
>> modification,
>> >
>> > Eh? There's a huge body of cultural awareness that only immutable
>> objects
>> > should define a hash, implying that the hash remains constant during the
>> > object's lifetime.
>> >
>> >> and that would even be valid in the face of the usual dict lookup
>> >> invariants if changes are only applied while the object is not
>> referenced
>> >> by any dict.
>> >
>> > And how would you know it isn't?
>> Well, if it's an object with a mutable hash then it's up to the
>> application
>> defining that object to make sure it's used in a sensible way.
>> Immutability
>> just makes your life easier. I can imagine that an object gets removed
>> from
>> a dict (say, a cache), modified and then reinserted, and I think it's
>> valid
>> to allow the modification to have an impact on the hash in this case, in
>> order to accommodate for any changes to equality comparisons due to the
>> modification.
> That could be considered valid only in a very abstract, theoretical,
> non-constructive way, since there is no protocol to detect removal from a
> dict (and you cannot assume an object is used in only one dict at a time).
>> That being said, it seems that the Python docs actually consider constant
>> hashes a requirement rather than a virtue.
>> http://docs.python.org/glossary.html#term-hashable
>> """
>> An object is hashable if it has a hash value which never changes during
>> its
>> lifetime (it needs a __hash__() method), and can be compared to other
>> objects (it needs an __eq__() or __cmp__() method). Hashable objects which
>> compare equal must have the same hash value.
>> """
>> It also seems to me that the wording "has a hash value which never changes
>> during its lifetime" makes it pretty clear that the lifetime of the hash
>> value is not guaranteed to supersede the lifetime of the object (although
>> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
>> well?).
> Across pickle-unpickle it's not considered the same object. Pickling at
> best preserves values.

Updating the docs to explicitly clarify this sounds like a good idea.  How
does this wording to be added to the glossary.rst hashing section sound?

"""Hash values may not be stable across Python processes and must not be
used for storage or otherwise communicated outside of a single Python

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120116/8c134b25/attachment.html>

More information about the Python-Dev mailing list