[Python-3000] Need closure on __cmp__ removal

David A. Wheeler dwheeler at dwheeler.com
Wed Jan 16 00:07:07 CET 2008


Bill Janssen:
>I'm a bit baffled here; I find cmp() fairly handy in writing sort routines...
>Is there a better / newer / official way of doing this? If not, isn't
>"cmp()" still useful to have around?

I agree with you - I find cmp() useful, and I notice that some others do too.
E.G., Adam Olson said "It's clear to me that the opposition to
removing __cmp__ comes down to "make the common things easy and
the rare things possible".  Removing __cmp__ means one of the
common things (total ordering) becomes hard."

hashcollision:
> On Jan 5, 2008 10:40 PM, hashcollision <hashcollision at gmail.com> wrote:
> > David A. Wheeler has already written a draft PEP, which can be found here:
> > http://www.dwheeler.com/misc/pep-cmp.txt.

You can blame me for that :-).


Guido van Rossum:
> Thanks, I'd missed that.
> 
> But alas, it's a bit short on the motivation for rich comparisons. For
> example it fails to notice that quite a few object types need to be
> comparable for equality but need not be orderable. Comparison for
> equality only is often much faster than a full three-way-compare,
> since there is likely some quick test that decides two instances
> cannot possibly be equal (e.g. lists of different lengths).

A fair comment.  But if you really only want to know 'equal/not-equal',
I think it's reasonable to presume that the user will just use == or !=,
instead of __cmp__ directly.  I don't propose removing == or !=, because
they're still needed for the asymmetric cases.  In those cases, I'd expect
that the user would write == or !=, which would then invoke __eq__ or
__ne__, which would make the decision.  If there's a performance difference,
for the basic types I'd expect Python to just implement == or != in the "faster"
way directly - so __cmp__ wouldn't matter in such cases.

I'll try to modify the draft PEP to cover that.

> It also misses the use case of overloading < and <= as set inclusion
> operators.

Okay.  Well, I can add this:
    Note that < and <= will continue to be overloaded as 
    set inclusion operators.

But I think there's more behind this comment that I don't understand.
Can you help me understand why that's a serious problem for __cmp__?

> But the biggest thing missing is precise semantics. Saying "exactly
> the same semantics as with Python 2.5" doesn't cut it (those semantics
> are incredibly hairy and sometimes surprising, and their
> implementation was a nightmare -- I've rarely been as relieved as when
> I was able to cut those out of the implementation).

Okay.  I made a stab, it's almost certainly wrong, suggestions welcome.
It should be _simple_, returning +1,0, or -1.

> A PEP should specify exactly what happens when both __cmp__ and one or
> more of the __xx__ operators are defined on the same class;

I think the goal should be "not surprising". IE: When you use a specific relationship like "<",
it calls __lt__(); the implementation of __lt__ may call __cmp__.

> it should
> specify what it means to override one or the other in a subclass;

Fair enough.  Hopefully that follows from the above.

> and
> it should define Python and C APIs for invoking __cmp__

I suggest using the 2.5 interfaces - is there a reason that must change?

> (and what it
> should do if __cmp__ isn't defined).

I'd expect it to do the same thing as calling __foo__ if __foo__ isn't defined.
Indeed, anything else would be surprising.

> I also notice that, because not-equal can often be decided more
> quickly than a full ordering, the argument (which I brought up
> myself!) that not having __cmp__ would slow down list comparisons may
> be wrong or at least weak. E.g., for the sake of argument, let's
> assume that __eq__ is just 10% faster than __cmp__, and __lt__ is as
> fast as __eq__. Then comparing two lists using __cmp__ on successive
> element pairs would be slower than using __eq__ plus one final __lt__
> if the deciding element is beyond the 9th index. If __eq__ is twice as
> fast as __cmp__ and __lt__ is the same speed as __cmp__, using __cmp__
> would be slower beyond the 2nd index.

True, but if that's true for a particular comparison, then the
implementation of __cmp__ could ITSELF implement the comparison that way.
Besides, if I understand you correctly, that presumes that the
user is keeping track of all the subelements through the comparison.
That's a pain; I'd rather say "compare these items", let the structures
do all the tracking, and get just the final answer.

Anyway, I hope this helps.

--- David A. Wheeler


More information about the Python-3000 mailing list