[Python-Dev] Status of the fix for the hash collision vulnerability

Guido van Rossum guido at python.org
Sun Jan 15 18:44:08 CET 2012


On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> Guido van Rossum, 15.01.2012 17:10:
> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
> >> Terry Reedy, 14.01.2012 06:43:
> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >>>
> >>>> It is perfectly okay to break existing users who had anything
> depending
> >>>> on ordering of internal hash tables. Their code was already broken.
> >>>
> >>> Given that the doc says "Return the hash value of the object", I do not
> >>> think we should be so hard-nosed. The above clearly implies that there
> is
> >>> such a thing as *the* Python hash value for an object. And indeed, that
> >> has
> >>> been true across many versions. If we had written "Return a hash value
> >> for
> >>> the object, which can vary from run to run", the case would be
> different.
> >>
> >> Just a side note, but I don't think hash() is the right place to
> document
> >> this.
> >
> > You mean we shouldn't document that the hash() of a string will vary per
> > run?
>
> No, I mean that the hash() builtin function is not the right place to
> document the behaviour of a string hash. That should go into the string
> object documentation.
>
> Although, arguably, it may be worth mentioning in the docs of hash() that,
> in general, hash values of builtin types are bound to the lifetime of the
> interpreter instance (or entire runtime?) and may change after restarts. I
> think that's a reasonable restriction to document that prominently, even if
> it will only apply to str for the time being.
>

Actually it will apply to a lot more than str, because the hash of
(immutable) compound objects is often derived from the hash of the
constituents, e.g. hash of a tuple.


> >> Hashing is a protocol in Python, just like indexing or iteration.
> >> Nothing keeps an object from changing its hash value due to
> modification,
> >
> > Eh? There's a huge body of cultural awareness that only immutable objects
> > should define a hash, implying that the hash remains constant during the
> > object's lifetime.
> >
> >> and that would even be valid in the face of the usual dict lookup
> >> invariants if changes are only applied while the object is not
> referenced
> >> by any dict.
> >
> > And how would you know it isn't?
>
> Well, if it's an object with a mutable hash then it's up to the application
> defining that object to make sure it's used in a sensible way. Immutability
> just makes your life easier. I can imagine that an object gets removed from
> a dict (say, a cache), modified and then reinserted, and I think it's valid
> to allow the modification to have an impact on the hash in this case, in
> order to accommodate for any changes to equality comparisons due to the
> modification.
>

That could be considered valid only in a very abstract, theoretical,
non-constructive way, since there is no protocol to detect removal from a
dict (and you cannot assume an object is used in only one dict at a time).


> That being said, it seems that the Python docs actually consider constant
> hashes a requirement rather than a virtue.
>
> http://docs.python.org/glossary.html#term-hashable
>
> """
> An object is hashable if it has a hash value which never changes during its
> lifetime (it needs a __hash__() method), and can be compared to other
> objects (it needs an __eq__() or __cmp__() method). Hashable objects which
> compare equal must have the same hash value.
> """
>
> It also seems to me that the wording "has a hash value which never changes
> during its lifetime" makes it pretty clear that the lifetime of the hash
> value is not guaranteed to supersede the lifetime of the object (although
> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
> well?).
>

Across pickle-unpickle it's not considered the same object. Pickling at
best preserves values.

However, this entry in the glossary only seems to have appeared with Py2.6,
> likely as a result of the abc changes. So it won't help in defending a
> change to the hash function.
>
>
> >> So the guarantees do not depend on the function hash() and may
> >> be even weaker than your above statement.
> >
> > There are no actual guarantees for hash(), but lots of rules for
> > well-behaved hashes.
>
> Absolutely.
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120115/398f4ec3/attachment-0001.html>


More information about the Python-Dev mailing list