[Python-Dev] == on object tests identity in 3.x - list delegation to members?
Andreas Maier
andreas.r.maier at gmx.de
Sun Jul 13 17:13:20 CEST 2014
Am 11.07.2014 22:54, schrieb Ethan Furman:
> On 07/11/2014 07:04 AM, Andreas Maier wrote:
>> Am 09.07.2014 03:48, schrieb Raymond Hettinger:
>>>
>>> Personally, I see no need to make the same mistake by removing
>>> the identity-implies-equality rule from the built-in containers.
>>> There's no need to upset the apple cart for nearly zero benefit.
>>
>> Containers delegate the equal comparison on the container to their
>> elements; they do not apply identity-based comparison
>> to their elements. At least that is the externally visible behavior.
>
> If that were true, then [NaN] == [NaN] would be False, and it is not.
>
> Here is the externally visible behavior:
>
> Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
> [GCC 4.7.3] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> --> NaN = float('nan')
> --> NaN == NaN
> False
> --> [NaN] == [NaN]
> True
Ouch, that hurts ;-)
First, the delegation of sequence equality to element equality is not
something I have come up with during my doc patch. It has always been in
5.9 Comparisons of the Language Reference (copied from Python 3.4):
"Tuples and lists are compared lexicographically using comparison of
corresponding elements. This means that to compare equal, each element
must compare equal and the two sequences must be of the same type and
have the same length."
Second, if not by delegation to equality of its elements, how would the
equality of sequences defined otherwise?
But your test is definitely worth having a closer look at. I have
broadened the test somewhat and that brings up further questions. Here
is the test output, and a discussion of the results (test program
try_eq.py and its output test_eq.out are attached to issue #12067):
Test #1: Different equal int objects:
obj1: type=<class 'int'>, str=257, id=39305936
obj2: type=<class 'int'>, str=257, id=39306160
a) obj1 is obj2: False
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True
Discussion:
Case 1.c) can be interpreted that the list delegates its == to the == on
its elements. It cannot be interpreted to delegate to identity
comparison. That is consistent with how everyone (I hope ;-) would
expect int objects to behave, or lists or dicts of them.
The motivation for case f) is explained further down, it has to do with
caching.
Test #2: Same int object:
obj1: type=<class 'int'>, str=257, id=39305936
obj2: type=<class 'int'>, str=257, id=39305936
a) obj1 is obj2: True
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True
-> No surprises (I hope).
Test #3: Different equal float objects:
obj1: type=<class 'float'>, str=257.0, id=5734664
obj2: type=<class 'float'>, str=257.0, id=5734640
a) obj1 is obj2: False
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True
Discussion:
I added this test only to show that float NaN is a special case, and
that this test for float objects - that are not NaN - behaves like test
#1 for int objects.
Test #4: Same float object:
obj1: type=<class 'float'>, str=257.0, id=5734664
obj2: type=<class 'float'>, str=257.0, id=5734664
a) obj1 is obj2: True
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True
-> Same as test #2, hopefully no surprises.
Test #5: Different float NaN objects:
obj1: type=<class 'float'>, str=nan, id=5734784
obj2: type=<class 'float'>, str=nan, id=5734976
a) obj1 is obj2: False
b) obj1 == obj2: False
c) [obj1] == [obj2]: False
d) {obj1:'v'} == {obj2:'v'}: False
e) {'k':obj1} == {'k':obj2}: False
f) obj1 == obj2: False
Discussion:
Here, the list behaves as I would expect under the rule that it
delegates equality to its elements. Case c) allows that interpretation.
However, an interpretation based on identity would also be possible.
Test #6: Same float NaN object:
obj1: type=<class 'float'>, str=nan, id=5734784
obj2: type=<class 'float'>, str=nan, id=5734784
a) obj1 is obj2: True
b) obj1 == obj2: False
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: False
Discussion (this is Ethan's example):
Case 6.b) shows the special behavior of float NaN that is documented: a
float NaN object is the same as itself but unequal to itself.
Case 6.c) is the surprising case. It could be interpreted in two ways
(at least that's what I found):
1) The comparison is based on identity of the float objects. But that is
inconsistent with test #4. And why would the list special-case NaN
comparison in such a way that it ends up being inconsistent with the
special definition of NaN (outside of the list)?
2) The list does not always delegate to element equality, but attempts
to optimize if the objects are the same (same identity). We will see
later that that happens. Further, when comparing float NaNs of the same
identity, the list implementation forgot to special-case NaNs. Which
would be a bug, IMHO. I did not analyze the C implementation, so this is
all speculation based upon external visible behavior.
Test #7: Different objects (with equal x) of class C
(C.__eq__() implemented with equality of x,
C.__ne__() returning NotImplemented):
obj1: type=<class '__main__.C'>, str=C(256), id=39406504
obj2: type=<class '__main__.C'>, str=C(256), id=39406616
a) obj1 is obj2: False
C.__eq__(): self=39406504, other=39406616, returning True
b) obj1 == obj2: True
C.__eq__(): self=39406504, other=39406616, returning True
c) [obj1] == [obj2]: True
C.__eq__(): self=39406616, other=39406504, returning True
d) {obj1:'v'} == {obj2:'v'}: True
C.__eq__(): self=39406504, other=39406616, returning True
e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406616, returning True
f) obj1 == obj2: True
The __eq__() and __ne__() implementations each print a debug message.
The __ne__() is only defined to verify that it is not invoked, and that
the inherited default __ne__() does not chime in.
Discussion:
Here we see that the list equality comparison does invoke the element
equality. However, the picture becomes more complex further down.
Test #8: Same object of class C
(C.__eq__() implemented with equality of x,
C.__ne__() returning NotImplemented):
obj1: type=<class '__main__.C'>, str=C(256), id=39406504
obj2: type=<class '__main__.C'>, str=C(256), id=39406504
a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406504, returning True
f) obj1 == obj2: True
Discussion:
The == on the class C objects in case 8.b) invokes __eq__(), even though
the objects are the same object. This can be explained by the desire in
Python that classes should be able not to be reflexive, if needed. Like
float NaN, for example.
Now, the list equality in case 8.c) is interesting. The list equality
does not invoke element equality. Even though object equality in case
8.b) did not assume reflexivity and invoked the __eq__() method, the
list seems to assume reflexivity and seems to go by object identity.
The only other potential explanation (that I found) would be that some
aspects of the comparison behavior are cached. That's why I added the
cases f), which show that caching for comparison results does not happen
(the __eq__() method is invoked again).
So we are back to discussing why element equality does not assume
reflexivity, but list equality does. IMHO, that is another bug, or maybe
the same one.
Test #9: Different objects (with equal x) of class D
(D.__eq__() implemented with inequality of x,
D.__ne__() returning NotImplemented):
obj1: type=<class '__main__.D'>, str=C(256), id=39407064
obj2: type=<class '__main__.D'>, str=C(256), id=39406952
a) obj1 is obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
b) obj1 == obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
c) [obj1] == [obj2]: False
D.__eq__(): self=39406952, other=39407064, returning False
d) {obj1:'v'} == {obj2:'v'}: False
D.__eq__(): self=39407064, other=39406952, returning False
e) {'k':obj1} == {'k':obj2}: False
D.__eq__(): self=39407064, other=39406952, returning False
f) obj1 == obj2: False
Discussion:
Class D implements __eq__() by != on the data attribute. This test does
not really show any surprises, and is consistent with the theory that
list comparison delegates to element comparison. This is really just a
preparation for the next test, that uses the same object of this class.
Test #10: Same object of class D
(D.__eq__() implemented with inequality of x,
D.__ne__() returning NotImplemented):
obj1: type=<class '__main__.D'>, str=C(256), id=39407064
obj2: type=<class '__main__.D'>, str=C(256), id=39407064
a) obj1 is obj2: True
D.__eq__(): self=39407064, other=39407064, returning False
b) obj1 == obj2: False
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
D.__eq__(): self=39407064, other=39407064, returning False
f) obj1 == obj2: False
Discussion:
The inequality-based implementation of __eq__() explains case 10.b). It
is surprising (to me) that the list comparison in case 10.c) returns
True. If one compares that to case 9.c), one could believe that the
identities of the objects are used for both cases. But why would the
list not respect the result of __eq__() if it is implemented?
This behavior seems at least to be consistent with surprise of case 6.c)
In order to not just rely on the external behavior, I started digging
into the C implementation. For list equality comparison, I started at
list_richcompare() which uses PyObject_RichCompareBool(), which
shortcuts its result based on identity comparison, and thus enforces
reflexitivity.
The comment on line 714 in object.c in PyObject_RichCompareBool() also
confirms that:
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
IMHO, we need to discuss whether we are serious with the direction that
was claimed earlier in this thread, that reflexivity (i.e. identity
implies equality) should be decided upon by the classes and not by the
Python language. As I see it, we have some pieces of code that enforce
reflexivity, and some that don't.
Andy
More information about the Python-Dev
mailing list