Why keep identity-based equality comparison?

spam.noam at gmail.com spam.noam at gmail.com
Mon Jan 9 17:40:45 EST 2006


Hello,

Guido has decided, in python-dev, that in Py3K the id-based order
comparisons will be dropped. This means that, for example, "{} < []"
will raise a TypeError instead of the current behaviour, which is
returning a value which is, really, id({}) < id([]).

He also said that default equality comparison will continue to be
identity-based. This means that x == y will never raise an exception,
as is the situation is now. Here's his reason:

> Let me construct a hypothetical example: suppose we represent a car
> and its parts as objects. Let's say each wheel is an object. Each
> wheel is unique and we don't have equivalency classes for them.
> However, it would be useful to construct sets of wheels (e.g. the set
> of wheels currently on my car that have never had a flat tire). Python
> sets use hashing just like dicts. The original hash() and __eq__
> implementation would work exactly right for this purpose, and it seems
> silly to have to add it to every object type that could possibly be
> used as a set member (especially since this means that if a third
> party library creates objects for you that don't implement __hash__
> you'd have a hard time of adding it).

Now, I don't think it should be so. My reason is basically "explicit is
better than implicit" - I think that the == operator should be reserved
for value-based comparison, and raise an exception if the two objects
can't be meaningfully compared by value. If you want to check if two
objects are the same, you can always do "x is y". If you want to create
a set of objects based on their identity (that is, two different
objects with the same value are considered different elements), you
have two options:
1. Create another set type, which is identity-based - it doesn't care
about the hash value of objects, it just collects references to
objects. Instead of using set(), you would be able to use, say,
idset(), and everything would work as wanted.
2. Write a class like this:

class Ref(object):
    def __init__(self, obj):
        self._obj = obj
    def __call__(self):
        return self._obj
    def __eq__(self, other):
        return isinstance(other, Ref) and self._obj is other._obj
    def __hash__(self):
        return id(self._obj) ^ 0xBEEF

and use it like this:

st = set()
st.add(Ref(wheel1))
st.add(Ref(wheel2))
if Ref(wheel1) in st:
...
Those solutions allow the one who writes the class to define a
value-based comparison operator, and allow the user of the class to
explicitly state if he wants value-based behaviour or identity-based
behaviour.

A few more examples of why this explicit behaviour is good:

* Things like "Decimal(3.0) == 3.0" will make more sense (raise an
exception which explains that decimals should not be compared to
floats, instead of returning False).
* You won't be able to use objects as keys, expecting them to be
compared by value, and causing a bug when they don't. I recently wrote
a sort-of OCR program, which contains a mapping from a numarray array
of bits to a character (the array is the pixel-image of the char).
Everything seemed to work, but the program didn't recognize any
characters. I discovered that the reason was that arrays are hashed
according to their identity, which is a thing I had to guess. If
default == operator were not defined, I would simply get a TypeError
immediately.
* It is more forward compatible - when it is discovered that two types
can sensibly be compared, the comparison can be defined, without
changing an existing behaviour which doesn't raise an exception.

My question is, what reasons are left for leaving the current default
equality operator for Py3K, not counting backwards-compatibility?
(assume that you have idset and iddict, so explicitness' cost is only
two characters, in Guido's example)

Thanks,
Noam




More information about the Python-list mailing list