<br><br><div class="gmail_quote">On Sun, Jan 15, 2012 at 9:44 AM, Guido van Rossum <span dir="ltr"><<a href="mailto:guido@python.org">guido@python.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div><div class="h5">On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <span dir="ltr"><<a href="mailto:stefan_ml@behnel.de" target="_blank">stefan_ml@behnel.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Guido van Rossum, 15.01.2012 17:10:<br>
<div>> On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:<br>
>> Terry Reedy, 14.01.2012 06:43:<br>
>>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:<br>
>>><br>
>>>> It is perfectly okay to break existing users who had anything depending<br>
>>>> on ordering of internal hash tables. Their code was already broken.<br>
>>><br>
>>> Given that the doc says "Return the hash value of the object", I do not<br>
>>> think we should be so hard-nosed. The above clearly implies that there is<br>
>>> such a thing as *the* Python hash value for an object. And indeed, that<br>
>> has<br>
>>> been true across many versions. If we had written "Return a hash value<br>
>> for<br>
>>> the object, which can vary from run to run", the case would be different.<br>
>><br>
>> Just a side note, but I don't think hash() is the right place to document<br>
>> this.<br>
><br>
> You mean we shouldn't document that the hash() of a string will vary per<br>
> run?<br>
<br>
</div>No, I mean that the hash() builtin function is not the right place to<br>
document the behaviour of a string hash. That should go into the string<br>
object documentation.<br>
<br>
Although, arguably, it may be worth mentioning in the docs of hash() that,<br>
in general, hash values of builtin types are bound to the lifetime of the<br>
interpreter instance (or entire runtime?) and may change after restarts. I<br>
think that's a reasonable restriction to document that prominently, even if<br>
it will only apply to str for the time being.<br><div></div></blockquote></div></div><div><br>Actually it will apply to a lot more than str, because the hash of (immutable) compound objects is often derived from the hash of the constituents, e.g. hash of a tuple.<br>
</div><div class="im"><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
>> Hashing is a protocol in Python, just like indexing or iteration.<br>
>> Nothing keeps an object from changing its hash value due to modification,<br>
><br>
> Eh? There's a huge body of cultural awareness that only immutable objects<br>
> should define a hash, implying that the hash remains constant during the<br>
> object's lifetime.<br>
><br>
>> and that would even be valid in the face of the usual dict lookup<br>
>> invariants if changes are only applied while the object is not referenced<br>
>> by any dict.<br>
><br>
> And how would you know it isn't?<br>
<br>
</div>Well, if it's an object with a mutable hash then it's up to the application<br>
defining that object to make sure it's used in a sensible way. Immutability<br>
just makes your life easier. I can imagine that an object gets removed from<br>
a dict (say, a cache), modified and then reinserted, and I think it's valid<br>
to allow the modification to have an impact on the hash in this case, in<br>
order to accommodate for any changes to equality comparisons due to the<br>
modification.<br></blockquote></div><div><br>That could be considered valid only in a very abstract, theoretical, non-constructive way, since there is no protocol to detect removal from a dict (and you cannot assume an object is used in only one dict at a time).<br>
</div><div class="im"><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
That being said, it seems that the Python docs actually consider constant<br>
hashes a requirement rather than a virtue.<br>
<br>
<a href="http://docs.python.org/glossary.html#term-hashable" target="_blank">http://docs.python.org/glossary.html#term-hashable</a><br>
<br>
"""<br>
An object is hashable if it has a hash value which never changes during its<br>
lifetime (it needs a __hash__() method), and can be compared to other<br>
objects (it needs an __eq__() or __cmp__() method). Hashable objects which<br>
compare equal must have the same hash value.<br>
"""<br>
<br>
It also seems to me that the wording "has a hash value which never changes<br>
during its lifetime" makes it pretty clear that the lifetime of the hash<br>
value is not guaranteed to supersede the lifetime of the object (although<br>
that's a rather muddy definition - memory lifetime? or pickle-unpickle as<br>
well?).<br></blockquote></div><div><br>Across pickle-unpickle it's not considered the same object. Pickling at best preserves values.<br></div></div></blockquote><div><br></div><div>Updating the docs to explicitly clarify this sounds like a good idea. How does this wording to be added to the glossary.rst hashing section sound?</div>
<div><br></div><div>"""Hash values may not be stable across Python processes and must not be used for storage or otherwise communicated outside of a single Python session."""</div><div><br></div>
<div>-gps</div></div>