Guido rethinking removal of cmp from sort method

Wed Mar 30 05:06:20 EDT 2011

On Tue, Mar 29, 2011 at 03:35:40PM -0400, Terry Reedy wrote:
> For anyone interested, the tracker discussion on removing cmp is at
> http://bugs.python.org/issue1771
> There may have been more on the old py3k list and pydev list.
> 
> One point made there is that removing cmp= made list.sort consistent
> with all the other comparision functions,
> min/max/nsmallest/nlargest/groupby that only have a key arg. How
> many would really want cmp= added everywhere?

I wouldn't have a problem with it.

I would also like to react to the following.

Guido van Rossum in msg95975 on http://bugs.python.org/issue1771 wrote:
| Also, for all of you asking for cmp back, I hope you realize that 
| sorting N values using a custom cmp function makes about N log N calls 
| calls to cmp, whereas using a custom key calls the key function only N 
| times.  This means that even if your cmp function is faster than the 
| best key function you can write, the advantage is lost as N increases 
| (which is just where you'd like it to matter most :-).

This is a play on semantics. If you need python code to compare
two items, then this code will be called N log N times, independently
of the fact how this code is presented, as a cmp function or as rich
comparison methods. So forcing people to write a key function in cases
where this will only result in the cmp code being translated to __lt__
code, accomplishes nothing. 

As far as I can see, key will only produce significant speedups, if
comparing items can then be completly done internally in the python
engine without referencing user python code.

> A minor problem problem with cmp is that the mapping between return
> values and input comparisons is somewhat arbitrary. Does -1 mean a<b
> or b<a? (That can be learned and memorized, of course, though I tend
> to forget without constant use).

My rule of thumb is that a < b is equivallent with cmp(a, b) < 0

> A bigger problem is that it conflicts with key=. What is the result of
> l=[1,3,2]
> l.sort(cmp=lambda x,y:y-x, key=lambda x: x)
> print l
> ? (for answer, see http://bugs.python.org/issue11712 )
> 
> While that can also be learned, I consider conflicting parameters
> undesireable and better avoided when reasonably possible. So I see
> this thread as a discussion of the meaning of 'reasonably' in this
> particular case.

But what does this have to do with use cases? Does what is reasonable
depend on the current use cases without regard of possible future use
cases? Is the conflict between key and cmp a lesser problem in the
case of someone having a huge data set to sort on a computer that lacks
the resources to decorate as opposed to currently noone having such
a data set? Are we going to decide which functions/methods get a cmp
argument depening on which use cases we currently have that would need it?

This thread started with a request for use cases. But if you take this
kind of things into consideration, I don't see how use cases can then
make a big difference in the final decision.