[issue13707] Clarify hash() lifetime
New submission from Terry J. Reedy <tjreedy@udel.edu>: Current 3.2.2 docs: id(object) Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. [model] hash(object) Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys Suggestion: change "Hash values are integers. They ..." to "This should be an integer which is constant for this object during its lifetime. Hash values ..." Rationale: For builtin class instances, hash values are guaranteed to be constant that long, and only that long, as the default hash(ob) for object() instances is currently, for my win7, 64 bit, 3.2.2 CPython, id(ob) // 16 (the minimum object size). User class instance hashes (with custom __hash__) *should* have the same lifetime. But since Python cannot enforce that, I did not say 'guaranteed'. User code should *not* depend on a longer lifetime, just as for id() output. It seems worth implying that, as for id(), because (based on recent pydev discussion) people seems to be prone to over-generalize the current longer-term stability of number and string hashes, which itself may disappear in future releases. (see #13703) ---------- assignee: docs@python components: Documentation messages: 150561 nosy: docs@python, terry.reedy priority: normal severity: normal stage: needs patch status: open title: Clarify hash() lifetime type: enhancement versions: Python 2.7, Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Changes by Alex Gaynor <alex.gaynor@gmail.com>: ---------- nosy: +alex _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Martin v. Löwis <martin@v.loewis.de> added the comment: -1. The hash has nothing to do with the lifetime, but with the value of an object. ---------- nosy: +loewis _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Terry J. Reedy <tjreedy@udel.edu> added the comment: Martin, I do not understand. The default hash is based on id (as is default equality comparison), not value. Are you OK with hash values changing if the 'value' changes? My understanding is that changing hash values for objects in sets and dicts is bad, which is why mutable builtins with value-based equality do not have hash values. ---------- title: Clarify hash() lifetime -> Clarify hash() constancy period _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Martin v. Löwis <martin@v.loewis.de> added the comment:
Martin, I do not understand. The default hash is based on id (as is default equality comparison), not value.
In the default implementation, the id *is* the object's value (i.e. objects, by default, only compare equal if they are identical). So the default implementation is just a special case of the more general rule that hashes need to be consistent with equality.
Are you OK with hash values changing if the 'value' changes?
An object that can change its value (i.e. a mutable object) should fail to hash. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Marc-Andre Lemburg <mal@egenix.com> added the comment: Terry J. Reedy wrote:
Terry J. Reedy <tjreedy@udel.edu> added the comment:
Martin, I do not understand. The default hash is based on id (as is default equality comparison), not value. Are you OK with hash values changing if the 'value' changes? My understanding is that changing hash values for objects in sets and dicts is bad, which is why mutable builtins with value-based equality do not have hash values.
Hash values are based on the object values, not their id(). See the various type implementations as reference. The id() is only used as hash for objects which don't have a "value" (and thus cannot be compared). Given that we have the invariant "a==b => hash(a)==hash(b)" in Python, it immediately follows that hash values for objects with comparison method cannot have a lifetime - at least not within the same process and, depending how you look at it, also not in multi-process applications. ---------- nosy: +lemburg _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Antoine Pitrou <pitrou@free.fr> added the comment: You can define a __hash__ that changes if the object changes. It is not recommended, but it's possible. So I agree with Martin that your proposed clarification is wrong. (I also think that it wouldn't bring anything, either) Suggest closing as invalid/rajected. ---------- nosy: +pitrou _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Terry J. Reedy <tjreedy@udel.edu> added the comment: Given that the doc says that use of hash() is to compare dict keys, it does not seem wrong to me to suggest that hash() should be usable to do so. I believe id() and consequently hash() are unique among builtins in being run-dependent. That is currently documented for id() but not for hash(). Given that people seriously asked whether we can randomize hash() with each run, because 'people' 'expect' it to remain rather constant, it does not seem useless to clarify that it can change with each run. I am sure my wording could be improved. An alternative would be 'Hash values for built-in objects are constant for each run but not necessarily thereafter." If you take into account what people can do with special methods, some of the other entries seem more wrong that my suggestion. For instance: "len(s) Return the length (the number of items) of an object." and "str(obj ... When only object is given, this returns its nicely printable representation." These are true only for built-in objects, but the policy is to leave out the qualification. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: -1 I concur with Martin. ---------- nosy: +rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Changes by Jesús Cea Avión <jcea@jcea.es>: ---------- nosy: +jcea _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: [Antoine]
Suggest closing as invalid/rajected.
[Martin]
-1. The hash has nothing to do with the lifetime, but with the value of an object.
---------- resolution: -> invalid status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13707> _______________________________________
participants (7)
-
Alex Gaynor
-
Antoine Pitrou
-
Jesús Cea Avión
-
Marc-Andre Lemburg
-
Martin v. Löwis
-
Raymond Hettinger
-
Terry J. Reedy