
Hello, A minor semantic change that creeped in some time ago was an implicit assumption that any object x should "reasonably" be expected to compare equal to itself. The arguments are summarized below (should this be documented, inserted in NEWS, turned in a mini-PEP a posteriori ... ?). The point is that comparisons now behave differently if they are issued by C or Python code. The expression 'x == y' will always call x.__eq__(y), because in some settings (e.g. Numeric) the result is not just True or False but an arbitrary object (e.g. a Numeric array of zeroes and ones). You cannot just say that 'x == x' should return True, because of that. So the behavior of PyObject_RichCompare() didn't change. On the other hand, when C code does a comparison it uses PyObject_RichCompareBool(), meaning it is only interested in a 1 or 0 answer. So PyObject_RichCompareBool() is where the shortcut about comparing an object with itself has been inserted. The result is the following: say x has a method __cmp__() that always returns -1.
x == x # x.__cmp__() is called False x < x # x.__cmp__() is called True cmp(x,x) # x.__cmp__() is NOT called 0 [x] == [x] # x.__cmp__() is NOT called True
The only way to explain the semantic is that the expression 'x == x' always call the special methods, but built-in functions are allowed to assume that any object compares equal to itself. In other words, C code usually checks that objects are "identical or equal", which is equalivalent to the Python expression 'x is y or x == y'. For example, the equality of lists now works as follows: def __eq__(lst1, lst2): if len(lst1) != len(lst2): return False for x, y in zip(lst1, lst2): if not (x is y or x == y): return False else: return True Should any of this be documented? An alternative behavior would have been to leave PyObject_RichCompareBool() alone and only insert the short-cut on specific object types' comparison methods on a case-by-case basis. For example, identical lists would just compare equal, without going through the loop comparing each element. This would remove the surprize of x.__cmp__(x) being not always called. The semantics would be easier to explain too: two lists are equal if they are the same list or if elements compare pairwise equal. We would have (with x as above):
cmp(x,x) # x.__cmp__(x) is called -1 [x] == [x] False # because the lists contain a non-equal element lst = [x]; lst == lst True # because it is the same list
Finally, whatever the final semantic is, we should make sure that existing built-in objects behave in a consistent way. Of course I'm thinking about floats: on my Linux box,
f = float('nan') cmp(f,f) 0 # because f is f f == f False # because float.__eq__() is called
Note that as discussed below the following behavior is *expected* and in accordance with standards:
float('nan') is float('nan') False float('nan') == float('nan') False # not the same object
Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects. -+- summary of arguments that lead to the original change (in cvs head). The argument in favor of the change is to remove the complex and not useful code trying to compare self-recursive structures: for example, in some setting comparing __builtin__.__dict__ with itself would trigger recursive comparison of __builtin__.__dict__ with itself in an endless loop. The complex algorithm was able to spot that. The new semantics immediately assume __builtin__.__dict__ to be equal to itself. Removing the complex algorithm means that you will now get an endless loop when comparing two *non-identical* self-recursive structures with the same shape, which is most probably not a problem in practice (on the contrary this implicit algorithm did hide a bug in one of my program). The argument _against_ used to be that e.g. it makes sense that the float value 'nan' should be different from itself, as various standards require. This argument does not apply in Python: these standards are about comparing *values*, not objects, so it makes perfect sense to say that even if x is the result of a computation that yielded an unknown answer 'nan', this answer is still equal to itself; what it is probably not equal to is *another* 'nan' which was obtained differently. In other words *two* float objects both containing 'nan' should be different, but *one* 'nan' object is still equal to itself. This is sane as long as no code considers 'nan' as a singleton, or tries to reuse 'nan' objects for different 'nan' values. -+- Armin

Armin Rigo <arigo@tunes.org> writes:
Hello,
A minor semantic change that creeped in some time ago was an implicit assumption that any object x should "reasonably" be expected to compare equal to itself. The arguments are summarized below (should this be documented, inserted in NEWS, turned in a mini-PEP a posteriori ... ?).
I have to admit that I haven't done my research, but I think you have your history backwards: the recent change AIUI is that it's possible that 'x == x' might be something *other* than Py_True... Cheers, mwh -- The ability to quote is a serviceable substitute for wit. -- W. Somerset Maugham

Hello Michael, On Tue, May 18, 2004 at 12:34:13PM +0100, Michael Hudson wrote:
A minor semantic change that creeped in some time ago was an implicit assumption that any object x should "reasonably" be expected to compare equal to itself.
I have to admit that I haven't done my research, but I think you have your history backwards: the recent change AIUI is that it's possible that 'x == x' might be something *other* than Py_True...
Uh? You're probably referring to a much older change. It was possible from at least Python 2.1, and I don't know if it has ever been impossible. I am speaking about these two changes in object.c: ---------------------------- revision 2.215 date: 2004/03/21 17:01:44; author: rhettinger; state: Exp; lines: +11 -1 Add identity shortcut to PyObject_RichCompareBool. ---------------------------- and: ---------------------------- revision 2.211 date: 2003/10/28 12:05:47; author: arigo; state: Exp; lines: +8 -180 Deleting cyclic object comparison. SF patch 825639 http://mail.python.org/pipermail/python-dev/2003-October/039445.html ---------------------------- Armin

[Armin Rigo]
A minor semantic change that creeped in some time ago was an implicit assumption that any object x should "reasonably" be expected to compare equal to itself.
[Michael Hudson]
I have to admit that I haven't done my research, but I think you have your history backwards: the recent change AIUI is that it's possible that 'x == x' might be something *other* than Py_True...
[Armin]
Uh? You're probably referring to a much older change. It was possible from at least Python 2.1, and I don't know if it has ever been impossible.
So-called "rich comparisons" were new in 2.1. Before then, all comparisons, however spelled, ended up invoking a spelling of cmp(), and PyObject_Compare() has had its current if (v == w) return 0; special case ever since revison 1.1 of object.c (that was about 14(!) years ago now). So, yes, before Python 2.1, it wasn't possible for x == x (or x <= x or x >= x) to return false, or for x < x, x > x, or x != x to return true, or for cmp(x, x) to return anything other than 0. Rich comparisons changed all that excepting the last. At first in 2.1, and through all released Pythons since then, the "v==w implies equal" special case has remained in PyObject_Compare, but was left out of implementations of the 6 "new", true/false relational operators.
I am speaking about these two changes in object.c: ---------------------------- revision 2.215 date: 2004/03/21 17:01:44; author: rhettinger; state: Exp; lines: +11 - 1 Add identity shortcut to PyObject_RichCompareBool. ----------------------------
Raymond added that after you, Guido and I agreed to it at the PyCon sprints in March. It was specifically to ease your troubles with comparing recursive dictionaries, which started after the hideous cyclic-compare hacks were removed. As CVS records, removing them was also your fault <wink>:
and:
---------------------------- revision 2.211 date: 2003/10/28 12:05:47; author: arigo; state: Exp; lines: +8 -180 Deleting cyclic object comparison. SF patch 825639 http://mail.python.org/pipermail/python-dev/2003-October/039445.html

cmp(x,x) # x.__cmp__() is NOT called 0
This is odd. We'll have problems trying to implemnt a NaN following the convention of being not-equal to itself, for example.
[x] == [x] # x.__cmp__() is NOT called True
This is correct, IMO. -- Gustavo Niemeyer http://niemeyer.net

cmp(x,x) # x.__cmp__() is NOT called 0
[Gustavo Niemeyer]
This is odd. We'll have problems trying to implemnt a NaN following the convention of being not-equal to itself, for example.
That's what rich comparisons are for. There's no problem getting
x = some_nan x = x False
in Python 2.4 (the result of that is *currently* a platform-dependent crapshoot, though). cmp(x, y) doesn't make sense for objects having only a partial ordering, so I don't care cmp() does with a NaN.

Hello Gustavo, On Tue, May 18, 2004 at 02:18:43PM -0300, Gustavo Niemeyer wrote:
This is odd. We'll have problems trying to implemnt a NaN following the convention of being not-equal to itself, for example.
I know the issue is a bit obfuscated, but please do read my mail entierely first: this is exactly my example :-) Armin

This is odd. We'll have problems trying to implemnt a NaN following the convention of being not-equal to itself, for example.
I know the issue is a bit obfuscated, but please do read my mail entierely first: this is exactly my example :-)
Sorry.. I read your mail in a rush, and the main issue looked to be related to these examples. Here we go again. [...]
f = float('nan') cmp(f,f) 0 # because f is f f == f False # because float.__eq__() is called
I don't get the same results:
f = float('nan') cmp(f,f) 0 f == f True
Also on a Linux box.
Note that as discussed below the following behavior is *expected* and in accordance with standards:
float('nan') is float('nan') False
I belive that testing identity on NaN might return True in cases like:
f = float('nan') f is f True
float('nan') == float('nan') False # not the same object
This is expected in an IEEE754 compliant implementation. Something we currently are not (I get different results, as shown).
Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects.
+1 -- Gustavo Niemeyer http://niemeyer.net

Gustavo Niemeyer <niemeyer@conectiva.com> writes:
This is expected in an IEEE754 compliant implementation.
What's one of them? <wink>
Something we currently are not (I get different results, as shown).
Are you using 2.3 or 2.4 here? Cheers, mwh -- surely, somewhere, somehow, in the history of computing, at least one manual has been written that you could at least remotely attempt to consider possibly glancing at. -- Adam Rixey

This is expected in an IEEE754 compliant implementation.
What's one of them? <wink>
I can tell you.. or I'd have to kill you. ;-)
Something we currently are not (I get different results, as shown).
Are you using 2.3 or 2.4 here?
2.3.. I wasn't aware about your recent change. Are you planning to include C99 stuff as well? -- Gustavo Niemeyer http://niemeyer.net

Gustavo Niemeyer <niemeyer@conectiva.com> writes:
This is expected in an IEEE754 compliant implementation.
What's one of them? <wink>
I can tell you.. or I'd have to kill you. ;-)
Something we currently are not (I get different results, as shown).
Are you using 2.3 or 2.4 here?
2.3.. I wasn't aware about your recent change. Are you planning to include C99 stuff as well?
No. My recent change was mainly motivated by the wart that in Python 2.3 on linux float('nan') == x came out True for any float x at all! I'm not enough of a numerics guy to know what people who would like real 754 support would actually want. Cheers, mwh -- Finding a needle in a haystack is a lot easier if you burn down the haystack and scan the ashes with a metal detector. -- the Silicon Valley Tarot (another one nicked from David Rush)

Armin Rigo <arigo@tunes.org>:
Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects.
Sounds right to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Armin Rigo]
... Note that as discussed below the following behavior is *expected* and in accordance with standards:
float('nan') is float('nan') False
That result simply isn't defined, even on platforms where float('nan') doesn't blow up: floats are immutable objects, and outside of interning strings, Python doesn't define anything about the identity of immutable objects returned from computation.
float('nan') == float('nan') False # not the same object
The one is a platform-dependent crapshoot, and again even on platforms where float('nan') doesn't blow up. In current CVS HEAD, Michael made it much more *likely* to return False than in any Python released to date, but there's still no guarantee. For example, even with current CVS, it returns True if compiled with MSVC 6, but False if compiled with MSVC 7.1. Python simply doesn't define anything about behavior in the presence of NaNs, infinities or signed zeroes (neither as floats nor as string representations (like 'nan')).
Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects.
If your recursive dict compares work OK then, fine by me. The exact details of comparison semantics have always been muddy, and I expect always will be (the combination of backward compatibility hacks and new features makes comparison reality as complicated as the code implementing it -- that's some of the most complicated stuff in the entire codebase).
participants (5)
-
Armin Rigo
-
Greg Ewing
-
Gustavo Niemeyer
-
Michael Hudson
-
Tim Peters