Mailman 3 RE: [Python-Dev] cmp(x,x) - Python-Dev

newer
Re: [Python-checkins]...

RE: [Python-Dev] cmp(x,x)

older
RE: cmp(x,x)

Michael Chermside

21 May 2004 21 May '04

10:07 a.m.

Armin Rigo writes:

...

Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects.

Robert H. Ledwith writes:

...

I have a serious objection to this because it is a major performance problem.

Performance is certainly a legitimate concern. But wouldn't your performance problems be fixed just as well if the tuple class implemented "identical-objects-are-immediately-equal" instead of putting it in the general comparison logic for all objects? Because if I understand correctly, that's what Armin is suggesting. Tuples would still be compared just as rapidly. -- Michael Chermside

Show replies by date

Armin Rigo

21 May 21 May

10:23 a.m.

New subject: cmp(x,x)

Hello Michael, On Fri, May 21, 2004 at 08:07:53AM -0700, Michael Chermside wrote:

...

Performance is certainly a legitimate concern.

Ok, then the current situation looks like the best one. I suggest we just drop a note in the documentation saying that if there are objects for which x.__eq__(x) isn't necessarily true, you shouldn't expect them to be handled in a fully consistent way.

...

But wouldn't your performance problems be fixed just as well if the tuple class implemented "identical-objects-are-immediately-equal" instead of putting it in the general comparison logic for all objects? Because if I understand correctly, that's what Armin is suggesting. Tuples would still be compared just as rapidly.

There is an extra costly indirection until the execution reaches that point. Armin

Gustavo Niemeyer

12:36 p.m.

New subject: cmp(x,x)

Hello again Armin,

...

...
Performance is certainly a legitimate concern.

Ok, then the current situation looks like the best one. I suggest we just drop a note in the documentation saying that if there are objects for which x.__eq__(x) isn't necessarily true, you shouldn't expect them to be handled in a fully consistent way.

Is it true that the current list comparison algorithm goes trough every element doing identity comparison, and if every element in the first list *is* the element at the same position in the second list, and the list has the same size, then lists are considered equal? If that's true, then I belive there's no reason for not comparing if the first list *is* the second list at the top of list_richcompare(), right? -- Gustavo Niemeyer http://niemeyer.net

Greg Ewing

23 May 23 May

7:40 p.m.

New subject: cmp(x,x)

Gustavo Niemeyer <niemeyer@conectiva.com>:

...

Is it true that the current list comparison algorithm goes trough every element doing identity comparison, and if every element in the first list *is* the element at the same position in the second list, and the list has the same size, then lists are considered equal?

...

If that's true, then I belive there's no reason for not comparing if the first list *is* the second list at the top of list_richcompare(), right?

What if the list contains a NaN? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Armin Rigo

24 May 24 May

11:28 a.m.

New subject: cmp(x,x)

Hello Greg, On Mon, May 24, 2004 at 12:40:21PM +1200, Greg Ewing wrote:

...

What if the list contains a NaN?

Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True. Armin

Tim Peters

11:38 a.m.

New subject: cmp(x,x)

[Greg Ewing]

...

...
What if the list contains a NaN?

[Armin Rigo]

...

Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True.

Yes. Please <wink>.

Raymond Hettinger

3:57 p.m.

New subject: cmp(x,x)

...

On Mon, May 24, 2004 at 12:40:21PM +1200, Greg Ewing wrote:

...
What if the list contains a NaN?

Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True.

More specifically, find a valid use case where bool(a==a) returns False (because Py_RichCompareBool() still does a boolean coercion on the result of a.__eq__(b)). For the use case to be compelling, it would need to justify breaking things like: mylist.append(x) assert x in mylist I don't believe that you will find sane examples. Identity-implying-equality is a useful and important invariant. Don't give it up easily. Raymond

Bob Ippolito

5:04 p.m.

New subject: cmp(x,x)

On May 24, 2004, at 4:57 PM, Raymond Hettinger wrote:

...

...
On Mon, May 24, 2004 at 12:40:21PM +1200, Greg Ewing wrote:

...
What if the list contains a NaN?

Uh, right. The whole issue with NaNs is muddy. Let's focus on Numeric as a much saner example of why you need 'a==a' to return something else than True.

More specifically, find a valid use case where bool(a==a) returns False (because Py_RichCompareBool() still does a boolean coercion on the result of a.__eq__(b)).

For the use case to be compelling, it would need to justify breaking things like:

mylist.append(x) assert x in mylist

I don't believe that you will find sane examples. Identity-implying-equality is a useful and important invariant. Don't give it up easily.

I think the use case mentioned was something like Numeric, where it may be useful for anArray == anotherArray to return an array of bool, not a single True or False. It would also be useful for defining objects that implement some domain specific language in Python.. for example, something like an AppleScript or SQL bridge where you are basically building an AST, compiling it to some other form, and sending it over the wire. It may be convenient if you could do this without actually parsing the code as a string. -bob

Raymond Hettinger

6:52 p.m.

New subject: cmp(x,x)

[Bob Ippolito]

...

I think the use case mentioned was something like Numeric, where it may be useful for anArray == anotherArray to return an array of bool, not a single True or False

The code in question is in Py_RichCompareBool() which *always* just returns True or False. That routine is called by list.__contains__() and many other functions that expect a yes or no answer. The regular rich comparison function, Py_RichCompare() is the same as it always was and can still return arrays of bools, complex numbers, or anything at all. IOW, we're still looking for a use case that warrants removing the identity-implies-equality guarantee out of Py_RichCompareBool(). My argument focused on the invariant assumptions that would be broken if this was done. Second, I don't believe that a sane use case will be found for an object x that returns False for "bool(x==x)" and that returns False for "x in [x]". Stretching ones mind to find such an object seems like an exercise in masochism. Raymond

Andrew Koenig

25 May 25 May

11:07 a.m.

New subject: cmp(x,x)

...

IOW, we're still looking for a use case that warrants removing the identity-implies-equality guarantee out of Py_RichCompareBool().

I can think of two use cases: 1) IEEE-like behavior. Before you say that there is nothing wrong with making floating-point comparison yield true for NaN==NaN, let me remind you of the beast called "Signaling NaN", which should raise an exception on any operation, including comparison. 2) A class might have comparison operations that put trace information in a log file for testing purposes. Such a class might be intended to help in testing searching algorithms, for example. I think it would be useful to be able to know whether x[i] is ever compared with x[i] by putting appropriate tracing comparison operations in the elements of x.

Tim Peters

21 May 21 May

2:21 p.m.

New subject: cmp(x,x)

[Armin Rigo]

...

Ok, then the current situation looks like the best one. I suggest we just drop a note in the documentation saying that if there are objects for which x.__eq__(x) isn't necessarily true, you shouldn't expect them to be handled in a fully consistent way.

If that's *all* we say, it's not really helpful, is it? Besides, __eq__() isn't even required to return a truth value. If we believe the CPython implementation is correct, then the truth is more that the implementation is allowed to return True for x == y if x is y is true, regardless of what x.__eq__(y), y.__eq__(x), x.__cmp__(y) or y.__cmp__(x) would do if called directly. If that's what we really mean, then better to say that. It then sounds exactly like the cheap-ass micro optimization it is <wink>. [Michael Chermside]

...

...
But wouldn't your> performance problems be fixed just as well if the tuple class implemented "identical-objects-are-immediately-equal" instead of putting it in the general comparison logic for all objects? Because if I understand correctly, that's what Armin is suggesting. Tuples would still be compared just as rapidly.

...

There is an extra costly indirection until the execution reaches that point.

Well, gasoline is costly, and so is a Mercedes-Benz. Michael was replying to a fellow worried about factor-of-2 slowdowns. The indirection here isn't in that ballpark of expense. Still, it's not worth any expense unless it's useful. NaN comparison is a red herring not only because Python doesn't have a sane story in any area involving NaNs, but also because the 32 possible 754 comparison predicates can't be mapped onto 6 operator symbols. Note that in C99, the infix C relational operators raise an exception (754's "invalid operation") when applied to a NaN, and C99 adds a pile of new macros for non-exceptional NaN comparison (my favorite is the fetchingly named "islessgreater(x, y)" -- which "is similar to (x) < (y) || (x) > (y); however, islessgreater(x,y) does not raise the invalid exception when x and y are unordered (nor does it evaluate x and y twice)"). In a sane design for handling NaNs, it's important that the "legacy" comparison operators raise an exception in their presence, because code written under the assumption of trichotomy is a bug minefield when unordered NaNs become possible comparands. In short, if Python ever wants "correct" NaN comparison, it needs to add new operators or functions too.

7472

Age (days ago)

7476

Last active (days ago)

List overview

Download

10 comments

8 participants

participants (8)

Andrew Koenig
Armin Rigo
Bob Ippolito
Greg Ewing
Gustavo Niemeyer
Michael Chermside
Raymond Hettinger
Tim Peters

RE: [Python-Dev] cmp(x,x)

Gustavo Niemeyer

Andrew Koenig

tags

participants (8)