[Python-Dev] Python3 regret about deleting list.sort(cmp=...)

Fredrik Johansson fredrik.johansson at gmail.com
Sat Mar 12 22:55:07 CET 2011


On Sat, Mar 12, 2011 at 9:44 PM, Guido van Rossum <guido at python.org> wrote:
> I was just reminded that in Python 3, list.sort() and sorted() no
> longer support the cmp (comparator) function argument. The reason is
> that the key function argument is always better. But now I have a
> nagging doubt about this:
>
> I recently advised a Googler who was sorting a large dataset and
> running out of memory. My analysis of the situation was that he was
> sorting a huge list of short lines of the form "shortstring,integer"
> with a key function that returned a tuple of the form ("shortstring",
> integer). Using the key function argument, in addition to N short
> string objects, this creates N tuples of length 2, N more slightly
> shorter string objects, and N integer objects. (Not to count a
> parallel array of N more pointers.) Given the object overhead, this
> dramatically increased the memory usage. It so happens that in this
> particular Googler's situation, memory is constrained but CPU time is
> not, and it would be better to parse the strings over and over again
> in a comparator function.
>
> But in Python 3 this solution is no longer available. How bad is that?
> I'm not sure. But I'd like to at least get the issue out in the open.

The removal of cmp and especially sort(cmp=...) was probably my least
favorite change in Python 3.

Sorting with a key only makes sense when the data can be mapped onto
builtin types (typically ints, strings, and tuples thereof) in an
order-preserving way. When this is not possible, you have to map it
onto a custom type and effectively implement a cmp function via the
comparison methods of that type. This is an ugly and slow substitute
for something that used to be elegant and fast.

Consider sorting a list of pairs representing fractions. This can be
done easily in Python 2.x with the comparison function lambda
(p,q),(r,s): cmp(p*s, q*r). In Python 2.6, this is about 40 times
faster than using fractions.Fraction as a key function.

Fredrik


More information about the Python-Dev mailing list