[Python-Dev] == on object tests identity in 3.x - list delegation to members?

Sun Jul 13 17:13:20 CEST 2014

Am 11.07.2014 22:54, schrieb Ethan Furman:
> On 07/11/2014 07:04 AM, Andreas Maier wrote:
>> Am 09.07.2014 03:48, schrieb Raymond Hettinger:
>>>
>>> Personally, I see no need to make the same mistake by removing
>>> the identity-implies-equality rule from the built-in containers.
>>> There's no need to upset the apple cart for nearly zero benefit.
>>
>> Containers delegate the equal comparison on the container to their
>> elements; they do not apply identity-based comparison
>> to their elements. At least that is the externally visible behavior.
>
> If that were true, then [NaN] == [NaN] would be False, and it is not.
>
> Here is the externally visible behavior:
>
> Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
> [GCC 4.7.3] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> --> NaN = float('nan')
> --> NaN == NaN
> False
> --> [NaN] == [NaN]
> True

Ouch, that hurts ;-)

First, the delegation of sequence equality to element equality is not 
something I have come up with during my doc patch. It has always been in
5.9 Comparisons of the Language Reference (copied from Python 3.4):

"Tuples and lists are compared lexicographically using comparison of 
corresponding elements. This means that to compare equal, each element 
must compare equal and the two sequences must be of the same type and 
have the same length."

Second, if not by delegation to equality of its elements, how would the 
equality of sequences defined otherwise?

But your test is definitely worth having a closer look at. I have 
broadened the test somewhat and that brings up further questions. Here 
is the test output, and a discussion of the results (test program 
try_eq.py and its output test_eq.out are attached to issue #12067):

Test #1: Different equal int objects:

   obj1: type=<class 'int'>, str=257, id=39305936
   obj2: type=<class 'int'>, str=257, id=39306160

   a) obj1 is obj2: False
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

Discussion:

Case 1.c) can be interpreted that the list delegates its == to the == on 
its elements. It cannot be interpreted to delegate to identity 
comparison. That is consistent with how everyone (I hope ;-) would 
expect int objects to behave, or lists or dicts of them.

The motivation for case f) is explained further down, it has to do with 
caching.

Test #2: Same int object:

   obj1: type=<class 'int'>, str=257, id=39305936
   obj2: type=<class 'int'>, str=257, id=39305936

   a) obj1 is obj2: True
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

-> No surprises (I hope).

Test #3: Different equal float objects:

   obj1: type=<class 'float'>, str=257.0, id=5734664
   obj2: type=<class 'float'>, str=257.0, id=5734640

   a) obj1 is obj2: False
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

Discussion:

I added this test only to show that float NaN is a special case, and 
that this test for float objects - that are not NaN - behaves like test 
#1 for int objects.

Test #4: Same float object:

   obj1: type=<class 'float'>, str=257.0, id=5734664
   obj2: type=<class 'float'>, str=257.0, id=5734664

   a) obj1 is obj2: True
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

-> Same as test #2, hopefully no surprises.

Test #5: Different float NaN objects:

   obj1: type=<class 'float'>, str=nan, id=5734784
   obj2: type=<class 'float'>, str=nan, id=5734976

   a) obj1 is obj2: False
   b) obj1 == obj2: False
   c) [obj1] == [obj2]: False
   d) {obj1:'v'} == {obj2:'v'}: False
   e) {'k':obj1} == {'k':obj2}: False
   f) obj1 == obj2: False

Discussion:

Here, the list behaves as I would expect under the rule that it 
delegates equality to its elements. Case c) allows that interpretation. 
However, an interpretation based on identity would also be possible.

Test #6: Same float NaN object:

   obj1: type=<class 'float'>, str=nan, id=5734784
   obj2: type=<class 'float'>, str=nan, id=5734784

   a) obj1 is obj2: True
   b) obj1 == obj2: False
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: False

Discussion (this is Ethan's example):

Case 6.b) shows the special behavior of float NaN that is documented: a 
float NaN object is the same as itself but unequal to itself.

Case 6.c) is the surprising case. It could be interpreted in two ways 
(at least that's what I found):

1) The comparison is based on identity of the float objects. But that is 
inconsistent with test #4. And why would the list special-case NaN 
comparison in such a way that it ends up being inconsistent with the 
special definition of NaN (outside of the list)?

2) The list does not always delegate to element equality, but attempts 
to optimize if the objects are the same (same identity). We will see 
later that that happens. Further, when comparing float NaNs of the same 
identity, the list implementation forgot to special-case NaNs. Which 
would be a bug, IMHO. I did not analyze the C implementation, so this is 
all speculation based upon external visible behavior.

Test #7: Different objects (with equal x) of class C
    (C.__eq__() implemented with equality of x,
     C.__ne__() returning NotImplemented):

   obj1: type=<class '__main__.C'>, str=C(256), id=39406504
   obj2: type=<class '__main__.C'>, str=C(256), id=39406616

   a) obj1 is obj2: False
C.__eq__(): self=39406504, other=39406616, returning True
   b) obj1 == obj2: True
C.__eq__(): self=39406504, other=39406616, returning True
   c) [obj1] == [obj2]: True
C.__eq__(): self=39406616, other=39406504, returning True
   d) {obj1:'v'} == {obj2:'v'}: True
C.__eq__(): self=39406504, other=39406616, returning True
   e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406616, returning True
   f) obj1 == obj2: True

The __eq__() and __ne__() implementations each print a debug message. 
The __ne__() is only defined to verify that it is not invoked, and that 
the inherited default __ne__() does not chime in.

Discussion:

Here we see that the list equality comparison does invoke the element 
equality. However, the picture becomes more complex further down.

Test #8: Same object of class C
    (C.__eq__() implemented with equality of x,
     C.__ne__() returning NotImplemented):

   obj1: type=<class '__main__.C'>, str=C(256), id=39406504
   obj2: type=<class '__main__.C'>, str=C(256), id=39406504

   a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406504, returning True
   f) obj1 == obj2: True

Discussion:

The == on the class C objects in case 8.b) invokes __eq__(), even though 
the objects are the same object. This can be explained by the desire in 
Python that classes should be able not to be reflexive, if needed. Like 
float NaN, for example.

Now, the list equality in case 8.c) is interesting. The list equality 
does not invoke element equality. Even though object equality in case 
8.b) did not assume reflexivity and invoked the __eq__() method, the 
list seems to assume reflexivity and seems to go by object identity.

The only other potential explanation (that I found) would be that some 
aspects of the comparison behavior are cached. That's why I added the 
cases f), which show that caching for comparison results does not happen 
(the __eq__() method is invoked again).

So we are back to discussing why element equality does not assume 
reflexivity, but list equality does. IMHO, that is another bug, or maybe 
the same one.

Test #9: Different objects (with equal x) of class D
    (D.__eq__() implemented with inequality of x,
     D.__ne__() returning NotImplemented):

   obj1: type=<class '__main__.D'>, str=C(256), id=39407064
   obj2: type=<class '__main__.D'>, str=C(256), id=39406952

   a) obj1 is obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
   b) obj1 == obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
   c) [obj1] == [obj2]: False
D.__eq__(): self=39406952, other=39407064, returning False
   d) {obj1:'v'} == {obj2:'v'}: False
D.__eq__(): self=39407064, other=39406952, returning False
   e) {'k':obj1} == {'k':obj2}: False
D.__eq__(): self=39407064, other=39406952, returning False
   f) obj1 == obj2: False

Discussion:

Class D implements __eq__() by != on the data attribute. This test does 
not really show any surprises, and is consistent with the theory that 
list comparison delegates to element comparison. This is really just a 
preparation for the next test, that uses the same object of this class.

Test #10: Same object of class D
    (D.__eq__() implemented with inequality of x,
     D.__ne__() returning NotImplemented):

   obj1: type=<class '__main__.D'>, str=C(256), id=39407064
   obj2: type=<class '__main__.D'>, str=C(256), id=39407064

   a) obj1 is obj2: True
D.__eq__(): self=39407064, other=39407064, returning False
   b) obj1 == obj2: False
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
D.__eq__(): self=39407064, other=39407064, returning False
   f) obj1 == obj2: False

Discussion:

The inequality-based implementation of __eq__() explains case 10.b). It 
is surprising (to me) that the list comparison in case 10.c) returns 
True. If one compares that to case 9.c), one could believe that the 
identities of the objects are used for both cases. But why would the 
list not respect the result of __eq__() if it is implemented?

This behavior seems at least to be consistent with surprise of case 6.c)

In order to not just rely on the external behavior, I started digging 
into the C implementation. For list equality comparison, I started at 
list_richcompare() which uses PyObject_RichCompareBool(), which 
shortcuts its result based on identity comparison, and thus enforces 
reflexitivity.

The comment on line 714 in object.c in PyObject_RichCompareBool() also 
confirms that:

   /* Quick result when objects are the same.
      Guarantees that identity implies equality. */

IMHO, we need to discuss whether we are serious with the direction that 
was claimed earlier in this thread, that reflexivity (i.e. identity 
implies equality) should be decided upon by the classes and not by the 
Python language. As I see it, we have some pieces of code that enforce 
reflexivity, and some that don't.

Andy