[Python-Dev] Documentation Error for __hash__

M.-A. Lemburg mal at egenix.com
Fri Aug 29 14:43:11 CEST 2008


On 2008-08-29 13:03, Matt Giuca wrote:
>> Being hashable is a different from being usable as dictionary key.
>>
>> Dictionaries perform the lookup based on the hash value, but will
>> then have to check for hash collisions based on an equal comparison.
>>
>> If an object does not define an equal comparison, then it is not
>> usable as dictionary key.
>>
> 
> But if an object defines *neither* __eq__ or __hash__, then by default it is
> usable as a dictionary key (using the id() of the object for both default
> equality and hashing, which is fine, and works for all user-defined types by
> default).
> 
> An object defining __hash__ but not __eq__ is not problematic, since it
> still uses id() default for equality. (It just has potentially bad
> dictionary performance, if lots of things hash the same even though they
> aren't equal). This it not a problem by definition because *it is officially
> "okay" for two objects to hash the same, even if they aren't equal, though
> undesirable*.

Note that only instances have the default hash value id(obj). This
is not true in general. Most types don't implement the tp_hash
slot and thus are not hashable. Indeed, mutable types should not
implement that slot unless they know what they're doing :-)

> So all hashable objects are usable as dictionary keys, are they not? (As far
> as I know it isn't possible to have an object that does not have an equality
> comparison, unless you explicitly override __eq__ and have it raise a
> TypeError for some reason).

Sorry, I wasn't clear enough: with "not defining an equal comparison"
I meant that an equal comparison does not succeed, ie. raises an
exception or returns Py_NotImplemented (at the C level).

Due to the default of using the id(obj) as fallback for the equal
comparison, this has to be explicitly coded for. If this is not
the case (and that's probably the most common situation),
then you're right: hashable implies usable as dictionary key.

>> It's probably a good idea to implement __hash__ for objects that
>> implement comparisons, but it won't always work and it is certainly
>> not needed, unless you intend to use them as dictionary keys.
>>
> 
> But from what I know, it is a *bad* idea to implement __hash__ for any
> mutable object with non-reference equality (where equality depends on the
> mutable state), as an unbreakable rule. This is because if they are inserted
> into a dictionary, then mutated, they may suddenly be in the wrong bucket.
> This is why all the mutable types in Python with non-reference equality (eg.
> list, set, dict) are explicitly not hashable, while the immutable types (eg.
> tuple, frozenset, str) are hashable, and so are the mutable types with
> reference equality (eg. functions, user-defined classes by default).

Right.

By implementing __hash__ in classes you have the explicit choice of
either raising an exception or returning a useful hash value.

Again, the situation is better at the C level, since types
don't have a default tp_hash implementation, so have to explicitly
code such a slot in order for hash(obj) to work.

>>> and that mutable objects should raise a TypeError in __hash__.
>> That's a good idea, even though it's not needed either ;-)
>>
> 
> So I think my above "axioms" are a better (less restrictive, and still
> correct) rule than this one. It's OK for a mutable object to define
> __hash__, as long as its __eq__ doesn't depend upon its mutable state. For
> example, you can insert a function object into a dictionary, and mutate its
> closure, and it won't matter, because neither the hash nor the equality of
> the object is changing. It's only types like list and dict, with deep
> equality, where you run into this hash table problem.

I think the documentation needs to be changed to make the defaults
explicit.

The documentation should probably say:

"If you implement __cmp__ or
__eq__ on a class, also implement a __hash__ method (and either
have it raise an exception or return a valid non-changing hash
value for the object)."

"If you implement __hash__ on classes, you should consider implementing
__eq__ and/or __cmp__ as well, in order to control how dictionaries use
your objects."

In general, it's probably best to always implement both methods
on classes, even if the application will just use one of them.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 29 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611


More information about the Python-Dev mailing list