Q: sort's key and cmp parameters

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Oct 5 23:51:52 EDT 2009


On Mon, 05 Oct 2009 19:34:27 -0700, Paul Rubin wrote:

> Raymond Hettinger <python at rcn.com> writes:

>> When Guido made the final call, I'm sure he was balancing a number of
>> goals including language simplification and one-way-to-do-it.  I'm also
>> sure that sorting lazily evaluated infinite sequences was not on his
>> radar screen :-)
>
> I can't help wondering if there will be feature requests for a cmp
> argument for the rest of eternity until it eventually makes its way back
> in.

When I learned that cmp was being dropped, I was horrified. That lasted 
for about fifteen minutes of time actually using key :)

Occasionally I come across sorting problems where it isn't obvious how to 
write the key function, but I've never come across one where it wasn't 
possible at all. I no longer miss cmp.


>> Psychologically, the thing that I find to be interesting is that
>> beginners and intermediate users seem to take to key functions more
>> readily than old timers.
> 
> I think key is preferable in at least 90% of the cases.  I also think a
> general purpose language should be able to handle 100% of the cases, so
> if key is all you have, there can be a gap to fill.

I don't think it's possible to enumerate 100% of the cases, let alone 
support them. 

Here's one thing I've sometimes wanted to do: sort multiple lists at 
once, according to the contents of one of them, with a single call. You 
can get the same result with a key function and sorting each list 
separately, but if you've got a lot of large lists, that becomes 
expensive.

Python's sort is a memory-sort -- you can't sort a list too big to fit 
into memory. Sorting 400GB of data using a single function call is not 
something I see as fundamental to a general purpose language (although of 
course you should be able to write a disk-sort utility using Python).

Nor does sort() support the case where the value of the comparisons 
differs according to where the elements are in the list, e.g. given:

[a, b, c, d, a, b]

the result of comparing a and b at the start of the list is different 
from comparing them at the end of the list. I don't think there's an 
actual use-case for this feature, and cmp would have just as much trouble 
solving this (pretend) problem as key does, but I don't think languages 
need to supply this as a fundamental operation. If you need it, write 
your own sort function :)

Which of course would be a solution to your problem. The source code for 
timsort is available, and stable. You could fork it, re-enable the 
support for comparison functions, and put it on the Cheeseshop. If there 
is sufficient interest, maybe you'll get cmp support re-added to the 
built-in in a version or two.



-- 
Steven



More information about the Python-list mailing list