Are user-defined objects good keys?

Tim Peters tim.one at home.com
Sat Sep 29 22:53:37 EDT 2001


[Greg Weeks]
> The issue of identity, equality, and hashing strikes me as unsettled in
> general.  And, in my opinion, Python "got it wrong".  But I don't want to
> thrash out this issue.  I'd just like to know how to program.  So:
>
> Suppose I define a class.  I don't define __cmp__; so two instances are
> "==" iff they have the same id.

For this to be true, it must be that you also don't define __eq__.

> I also don't define __hash__.  Also, the instances are mutable.
>
> Let inst be an instance, and let n be an integer.  In my python
> installation, the following are true:
>
> A.    hash(n) == n

Not quite true:  try hash(-1).

> B.    hash(inst) == hash(id(inst))

Probably true on all platforms today, but not guaranteed.

> Regarding A: Is hash() used to create hash indices for dictionaries?

I don't know what you mean by "hash indices", but the answer is probably
"yes".  hash() produces a hash code, and the initial slot in an object's
hash table probe sequence is related to that, but is not exactly that.

> If so, is the identity function a good hash function for integer keys?

Yes, because across most common uses of integer-indexed hash tables, it
provides better behavior (fewer collisions) than a truly random hash
function would yield.  Python's dicts exhibit "better than random" behavior
in many common situations, and precisely because its hash functions try to
exploit common regularities to avoid collisions.

> Regarding B: Is B true for all python implementations?

It's not defined, and despite what you may think, you can make no good use
of "the answer".  What is defined is that x == y implies hash(x) == hash(y),
and that's the only thing that's defined about hash codes and their relation
to __eq__.  It's also the only thing that *needs* to be defined, and the
only thing your own __eq__ and __hash__ implementations need to guarantee.

> If not, then is it true that
>
> C.    hash(inst) == fn(id(inst))
>
> for *some* function fn()?

Obviously so, but the function fn() may not be any simpler than an
exhaustive listing <wink>.

> If either B or C is true, then most user-defined mutable instances
> can be used as dictionary keys.

You don't need B or C for that; so long as you're happy with identity
comparison, you only need that identity equality implies hash equality (and
it does, albeit trivially).

> I say "most" because most user-defined mutable most user-defined mutable
> instances will not redefine __cmp__.)

I don't believe that:  most mutable classes I've seen do define __cmp__, or
at least __eq__.

> My central question is: Can I use user-defined mutable instances (that
> are "==" iff their ids are equal) as dictionary keys?

Yes, provided you're happy with identity equality, and don't screw yourself
by defining an incompatible __hash__, __cmp__ or __eq__ method
("incompatible" meaning any combination of those violating the "x == y
implies hash(x) == hash(y)" constraint).

> On my system I can.  What about others?

Yes.  I suspect your view of this is more complicated than what Python does,
although I'm having a hard time seeing exactly where you jumped off the deep
end <wink>.





More information about the Python-list mailing list