Q: sort's key and cmp parameters
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Mon Oct 5 23:51:52 EDT 2009
On Mon, 05 Oct 2009 19:34:27 -0700, Paul Rubin wrote:
> Raymond Hettinger <python at rcn.com> writes:
>> When Guido made the final call, I'm sure he was balancing a number of
>> goals including language simplification and one-way-to-do-it. I'm also
>> sure that sorting lazily evaluated infinite sequences was not on his
>> radar screen :-)
>
> I can't help wondering if there will be feature requests for a cmp
> argument for the rest of eternity until it eventually makes its way back
> in.
When I learned that cmp was being dropped, I was horrified. That lasted
for about fifteen minutes of time actually using key :)
Occasionally I come across sorting problems where it isn't obvious how to
write the key function, but I've never come across one where it wasn't
possible at all. I no longer miss cmp.
>> Psychologically, the thing that I find to be interesting is that
>> beginners and intermediate users seem to take to key functions more
>> readily than old timers.
>
> I think key is preferable in at least 90% of the cases. I also think a
> general purpose language should be able to handle 100% of the cases, so
> if key is all you have, there can be a gap to fill.
I don't think it's possible to enumerate 100% of the cases, let alone
support them.
Here's one thing I've sometimes wanted to do: sort multiple lists at
once, according to the contents of one of them, with a single call. You
can get the same result with a key function and sorting each list
separately, but if you've got a lot of large lists, that becomes
expensive.
Python's sort is a memory-sort -- you can't sort a list too big to fit
into memory. Sorting 400GB of data using a single function call is not
something I see as fundamental to a general purpose language (although of
course you should be able to write a disk-sort utility using Python).
Nor does sort() support the case where the value of the comparisons
differs according to where the elements are in the list, e.g. given:
[a, b, c, d, a, b]
the result of comparing a and b at the start of the list is different
from comparing them at the end of the list. I don't think there's an
actual use-case for this feature, and cmp would have just as much trouble
solving this (pretend) problem as key does, but I don't think languages
need to supply this as a fundamental operation. If you need it, write
your own sort function :)
Which of course would be a solution to your problem. The source code for
timsort is available, and stable. You could fork it, re-enable the
support for comparison functions, and put it on the Cheeseshop. If there
is sufficient interest, maybe you'll get cmp support re-added to the
built-in in a version or two.
--
Steven
More information about the Python-list
mailing list