Armin Rigo writes:
Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects.
Robert H. Ledwith writes:
I have a serious objection to this because it is a major performance problem.
Performance is certainly a legitimate concern. But wouldn't your performance problems be fixed just as well if the tuple class implemented "identical-objects-are-immediately-equal" instead of putting it in the general comparison logic for all objects? Because if I understand correctly, that's what Armin is suggesting. Tuples would still be compared just as rapidly. -- Michael Chermside
Hello Michael, On Fri, May 21, 2004 at 08:07:53AM -0700, Michael Chermside wrote:
Performance is certainly a legitimate concern.
Ok, then the current situation looks like the best one. I suggest we just drop a note in the documentation saying that if there are objects for which x.__eq__(x) isn't necessarily true, you shouldn't expect them to be handled in a fully consistent way.
But wouldn't your performance problems be fixed just as well if the tuple class implemented "identical-objects-are-immediately-equal" instead of putting it in the general comparison logic for all objects? Because if I understand correctly, that's what Armin is suggesting. Tuples would still be compared just as rapidly.
There is an extra costly indirection until the execution reaches that point. Armin
Hello again Armin,
Performance is certainly a legitimate concern.
Ok, then the current situation looks like the best one. I suggest we just drop a note in the documentation saying that if there are objects for which x.__eq__(x) isn't necessarily true, you shouldn't expect them to be handled in a fully consistent way.
Is it true that the current list comparison algorithm goes trough every element doing identity comparison, and if every element in the first list *is* the element at the same position in the second list, and the list has the same size, then lists are considered equal? If that's true, then I belive there's no reason for not comparing if the first list *is* the second list at the top of list_richcompare(), right? -- Gustavo Niemeyer http://niemeyer.net
Gustavo Niemeyer <niemeyer@conectiva.com>:
Is it true that the current list comparison algorithm goes trough every element doing identity comparison, and if every element in the first list *is* the element at the same position in the second list, and the list has the same size, then lists are considered equal?
If that's true, then I belive there's no reason for not comparing if the first list *is* the second list at the top of list_richcompare(), right?
What if the list contains a NaN? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Hello Greg, On Mon, May 24, 2004 at 12:40:21PM +1200, Greg Ewing wrote:
What if the list contains a NaN?
Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True. Armin
On Mon, May 24, 2004 at 12:40:21PM +1200, Greg Ewing wrote:
What if the list contains a NaN?
Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True.
More specifically, find a valid use case where bool(a==a) returns False (because Py_RichCompareBool() still does a boolean coercion on the result of a.__eq__(b)). For the use case to be compelling, it would need to justify breaking things like: mylist.append(x) assert x in mylist I don't believe that you will find sane examples. Identity-implying-equality is a useful and important invariant. Don't give it up easily. Raymond
On May 24, 2004, at 4:57 PM, Raymond Hettinger wrote:
On Mon, May 24, 2004 at 12:40:21PM +1200, Greg Ewing wrote:
What if the list contains a NaN?
Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True.
More specifically, find a valid use case where bool(a==a) returns False (because Py_RichCompareBool() still does a boolean coercion on the result of a.__eq__(b)).
For the use case to be compelling, it would need to justify breaking things like:
mylist.append(x) assert x in mylist
I don't believe that you will find sane examples. Identity-implying-equality is a useful and important invariant. Don't give it up easily.
I think the use case mentioned was something like Numeric, where it may be useful for anArray == anotherArray to return an array of bool, not a single True or False. It would also be useful for defining objects that implement some domain specific language in Python.. for example, something like an AppleScript or SQL bridge where you are basically building an AST, compiling it to some other form, and sending it over the wire. It may be convenient if you could do this without actually parsing the code as a string. -bob
[Bob Ippolito]
I think the use case mentioned was something like Numeric, where it may be useful for anArray == anotherArray to return an array of bool, not a single True or False
The code in question is in Py_RichCompareBool() which *always* just returns True or False. That routine is called by list.__contains__() and many other functions that expect a yes or no answer. The regular rich comparison function, Py_RichCompare() is the same as it always was and can still return arrays of bools, complex numbers, or anything at all. IOW, we're still looking for a use case that warrants removing the identity-implies-equality guarantee out of Py_RichCompareBool(). My argument focused on the invariant assumptions that would be broken if this was done. Second, I don't believe that a sane use case will be found for an object x that returns False for "bool(x==x)" and that returns False for "x in [x]". Stretching ones mind to find such an object seems like an exercise in masochism. Raymond
IOW, we're still looking for a use case that warrants removing the identity-implies-equality guarantee out of Py_RichCompareBool().
I can think of two use cases: 1) IEEE-like behavior. Before you say that there is nothing wrong with making floating-point comparison yield true for NaN==NaN, let me remind you of the beast called "Signaling NaN", which should raise an exception on any operation, including comparison. 2) A class might have comparison operations that put trace information in a log file for testing purposes. Such a class might be intended to help in testing searching algorithms, for example. I think it would be useful to be able to know whether x[i] is ever compared with x[i] by putting appropriate tracing comparison operations in the elements of x.
[Armin Rigo]
Ok, then the current situation looks like the best one. I suggest we just drop a note in the documentation saying that if there are objects for which x.__eq__(x) isn't necessarily true, you shouldn't expect them to be handled in a fully consistent way.
If that's *all* we say, it's not really helpful, is it? Besides, __eq__() isn't even required to return a truth value. If we believe the CPython implementation is correct, then the truth is more that the implementation is allowed to return True for x == y if x is y is true, regardless of what x.__eq__(y), y.__eq__(x), x.__cmp__(y) or y.__cmp__(x) would do if called directly. If that's what we really mean, then better to say that. It then sounds exactly like the cheap-ass micro optimization it is <wink>. [Michael Chermside]
But wouldn't your> performance problems be fixed just as well if the tuple class implemented "identical-objects-are-immediately-equal" instead of putting it in the general comparison logic for all objects? Because if I understand correctly, that's what Armin is suggesting. Tuples would still be compared just as rapidly.
There is an extra costly indirection until the execution reaches that point.
Well, gasoline is costly, and so is a Mercedes-Benz. Michael was replying to a fellow worried about factor-of-2 slowdowns. The indirection here isn't in that ballpark of expense. Still, it's not worth any expense unless it's useful. NaN comparison is a red herring not only because Python doesn't have a sane story in any area involving NaNs, but also because the 32 possible 754 comparison predicates can't be mapped onto 6 operator symbols. Note that in C99, the infix C relational operators raise an exception (754's "invalid operation") when applied to a NaN, and C99 adds a pile of new macros for non-exceptional NaN comparison (my favorite is the fetchingly named "islessgreater(x, y)" -- which "is similar to (x) < (y) || (x) > (y); however, islessgreater(x,y) does not raise the invalid exception when x and y are unordered (nor does it evaluate x and y twice)"). In a sane design for handling NaNs, it's important that the "legacy" comparison operators raise an exception in their presence, because code written under the assumption of trichotomy is a bug minefield when unordered NaNs become possible comparands. In short, if Python ever wants "correct" NaN comparison, it needs to add new operators or functions too.
participants (8)
-
Andrew Koenig
-
Armin Rigo
-
Bob Ippolito
-
Greg Ewing
-
Gustavo Niemeyer
-
Michael Chermside
-
Raymond Hettinger
-
Tim Peters