[Python-Dev] Documentation Error for __hash__
Matt Giuca
matt.giuca at gmail.com
Fri Aug 29 13:03:22 CEST 2008
> Being hashable is a different from being usable as dictionary key.
>
> Dictionaries perform the lookup based on the hash value, but will
> then have to check for hash collisions based on an equal comparison.
>
> If an object does not define an equal comparison, then it is not
> usable as dictionary key.
>
But if an object defines *neither* __eq__ or __hash__, then by default it is
usable as a dictionary key (using the id() of the object for both default
equality and hashing, which is fine, and works for all user-defined types by
default).
An object defining __hash__ but not __eq__ is not problematic, since it
still uses id() default for equality. (It just has potentially bad
dictionary performance, if lots of things hash the same even though they
aren't equal). This it not a problem by definition because *it is officially
"okay" for two objects to hash the same, even if they aren't equal, though
undesirable*.
So all hashable objects are usable as dictionary keys, are they not? (As far
as I know it isn't possible to have an object that does not have an equality
comparison, unless you explicitly override __eq__ and have it raise a
TypeError for some reason).
It's probably a good idea to implement __hash__ for objects that
> implement comparisons, but it won't always work and it is certainly
> not needed, unless you intend to use them as dictionary keys.
>
But from what I know, it is a *bad* idea to implement __hash__ for any
mutable object with non-reference equality (where equality depends on the
mutable state), as an unbreakable rule. This is because if they are inserted
into a dictionary, then mutated, they may suddenly be in the wrong bucket.
This is why all the mutable types in Python with non-reference equality (eg.
list, set, dict) are explicitly not hashable, while the immutable types (eg.
tuple, frozenset, str) are hashable, and so are the mutable types with
reference equality (eg. functions, user-defined classes by default).
>
> > and that mutable objects should raise a TypeError in __hash__.
>
> That's a good idea, even though it's not needed either ;-)
>
So I think my above "axioms" are a better (less restrictive, and still
correct) rule than this one. It's OK for a mutable object to define
__hash__, as long as its __eq__ doesn't depend upon its mutable state. For
example, you can insert a function object into a dictionary, and mutate its
closure, and it won't matter, because neither the hash nor the equality of
the object is changing. It's only types like list and dict, with deep
equality, where you run into this hash table problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080829/10d14a53/attachment.htm>
More information about the Python-Dev
mailing list