[Python-Dev] Should the default equality operator compare values instead of identities?

Wed Nov 2 21:36:54 CET 2005

I think it should.

(I copy here messages from the thread about the default hash method.)

On 11/2/05, Michael Chermside <mcherm at mcherm.com> wrote:
> > Why not make the default __eq__ really compare the objects, that is,
> > their dicts and their slot-members?
>
> Short answer: not the desired behavior. Longer answer: there are
> three common patterns in object design. There are "value" objects,
> which should be considered equal if all fields are equal. There are
> "identity" objects which are considered equal only when they are
> the same object. And then there are (somewhat less common) "value"
> objects in which a few fields don't count -- they may be used for
> caching a pre-computed result for example. The default __eq__
> behavior has to cater to one of these -- clearly either "value"
> objects or "identity" objects. Guido chose to cater to "identity"
> objects believing that they are actually more common in most
> situations. A beneficial side-effect is that the default behavior
> of __eq__ is QUITE simple to explain, and if the implementation is
> easy to explain then it may be a good idea.
>
This is a very nice observation. I wish to explain why I think that
the default __eq__ should compare values, not identities.

1. If you want to compare identities, you can always use "is". There
is currently no easy way to compare your user-defined classes by
value, in case they happen to be "value objects", in Michael's
terminology - you have to compare every single member. (Comparing the
__dict__ attributes is ugly, and will not always work). If the default
were to compare the objects by value, and they happen to be "identity
objects", you can always do:
    def __eq__(self, other):
        return self is other

2. I believe that counter to what Michael said, "value objects" are
more common than "identity objects", at least when talking about
user-defined classes, and especially when talking about simple
user-defined classes, where the defaults are most important, since the
writer wouldn't care to define all the appropriate protocols. (this
was a long sentence) Can you give examples of common "identity
objects"? I believe that they are usually dealing with some
input/output, that is, with things that interact with the environment
(files, for example). I believe almost all "algorithmic" classes are
"value objects". And I think that usually, comparison based on value
will give the correct result for "identity objects" too, since if they
do I/O, they will usually hold a reference to an I/O object, like
file, which is an "identity object" by itself. This means that the
comparison will compare those objects, and return false, since the I/O
objects they hold are not the same one.

3. I think that value-based comparison is also quite easy to explain:
user-defined classes combine functions with a data structure. In
Python, the "data structure" is simply member names which reference
other objects. The default, value-based, comparison, checks if two
objects have the same member names, and that they are referencing
equal (by value) objects, and if so, returns True. I think that
explaining this is not harder than explaining the current dict
comparison.

Now, for Josiah's reply:

On 11/2/05, Josiah Carlson <jcarlson at uci.edu> wrote:
> > This leads me to another question: why should the default __eq__
> > method be the same as "is"? If someone wants to check if two objects
> > are the same object, that's what the "is" operator is for. Why not
> > make the default __eq__ really compare the objects, that is, their
> > dicts and their slot-members?
>
> Using 'is' makes sense when the default hash is id (and actually in
> certain other cases as well). Actually comparing the contents of an
> object is certainly not desireable with the default hash, and probably
> not desireable in the general case because equality doesn't always
> depend on /all/ attributes of extension objects.
>
>    Explicit is better than implicit.
>    In the face of ambiguity, refuse the temptation to guess.
>
I hope that the default hash would stop being id, as Josiah showed
that Guido decided, so let's don't discuss it.

Now, about the good point that sometimes the state doesn't depend on
all the attributes. Right. But the current default doesn't compare
them well too - you have no escape from writing an equality operator
by yourself. And I think this is not the common case.

I think that the meaning of "in the face of ambiguity, refuse the
temptation to guess" is that you should not write code that changes
its behaviour according to what the user will do, based on your guess
as to what he meant. This is not the case - the value-based comparison
is strictly defined. It may just not be what the user would want - and
in most cases, I think it will.

"Explicit is better than implicit" says only "better". identity-based
comparison is just as implicit as value-based comparison.

(I want to add that there is a simple way to support value-based
comparison when some members don't count, by writing a metaclass that
will check if your class has a member like
__non_state_members__ = ["_calculated_hash", "debug_member"]
and if so, would not compare them in the default equality-testing
method. I would say that this can even be made the behavior of the
default type.)

> I believe the current behavior of __eq__ is more desireable than
> comparing contents, as this may result in undesireable behavior
> (recursive compares on large nested objects are now slow, which used to
> be fast because default methods wouldn't cause a recursive comparison at
> all).

But if the default method doesn't do what you want, it doesn't matter
how fast it is. Remember that it's very easy to make recursive
comparisons, by comparing lists for example, and it hasn't disturbed
anyone.

To summarize, I think that value-based equality testing would usually
be what you want, and currently implementing it is a bit of a pain.

Concerning backwards-compatibility: show a warning in Python 2.5 when
the default equality test is being made, and change it in Python 2.6.

Comments, please!

Thanks,
Noam