[Python-ideas] a sorting protocol dunder method?
Steven D'Aprano
steve at pearwood.info
Sun Dec 3 19:34:20 EST 2017
On Sun, Dec 03, 2017 at 03:06:02PM -0800, Chris Barker wrote:
> Recent python has moved toward a "key" function for customized sorting:
>
> list.sort(key=key_fun)
>
> key is also used (according to
> https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in:
>
> min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby()
>
> with this fairly broad use, it seems it's becoming a fairly universal
> protocol for ordering.
Be careful: there are two different concepts here, which are only
loosely related:
- ordering values, that is, whether or not we can say that x < y
- sorting values in a collection.
By default, we sort by the inherent order of the values. But if the
values have no inherent order (they are unordered), we can sort
unordered items in a collection by providing an appropriate key
function. Hence why I say they are loosely related.
For example, we can sort the normally unordered complex numbers by
providing a key function:
py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: (z.real, z.imag))
[1j, (1+8j), (3-2j), (5+2j)]
But conceptually, I'm imposing an order on an otherwise unordered data
type. Complex numbers inherently have no order: it makes no sense to say
that 1+8j is less than 3-2j. But since the items in the list have to be
in *some* one-dimensional order, I can choose whichever order makes
sense for *this* collection:
py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: abs(z))
[1j, (3-2j), (5+2j), (1+8j)]
Another collection of the same values might be ordered differently. It
doesn't make sense to put that functionality into the complex numbers
themselves: complex numbers are unordered, and any order we impose on
them comes from the collection, not the individual numbers.
> However, if you are writing a custom class, and want to make it "sortable",
> you need to define (some of) the total comparison operators, which
> presumably are then called O(n logn) number of times for comparisons when
> sorting.
>
> Or provide a sort key function when you actually do the sorting, which
> requires some inside knowledge of the objects you are sorting.
This is conflating the two distinct concepts: the comparison operators
apply to the values in the collection; the key function applies to the
collection itself (although it does need to have inside knowledge of the
items in the collection).
> But what if there was a sort key magic method:
>
> __key__ or __sort_key__ (or whatever)
>
> that would be called by the sorting functions if:
>
> no key function was specified
>
> and
>
> it exists
>
> It seems this would provide a easy way to make custom classes sortable that
> would be nicer for end users (not writing key functions), and possibly more
> performant in the "usual" case.
I'm not sure how you can talk about performance of the __key__ method
without knowing the implementation :-)
If this __key__ method is called like __lt__, then the big O() behaviour
will be the same, worst case, O(n log n). If it is called like the key
function, then the big O() behaviour will be the same as the key
function now. Either way, you're just changing the name of the function
called, not how it is called.
> In fact, it's striking me that there may well be classes that are defining
> the comparison magic methods not because they want the objects to "work"
> with the comparison operators, but because that want them to work with sort
> and min, and max, and...
It is conceivable, I suppose, but if I were designing an unordered data
type (like complex) I wouldn't implement ordering operators to allow
sorting, I'd provide a separate key function.
But that assumes that there's only ever one way to order a collection of
unordered items. I don't think that's a safe assumption. Again, look at
complex above, there are at least three obvious ways:
- order by magnitude;
- order by real part first, then imaginary part;
- order by the absolute values of the real and imaginary parts
(so that 1+2j and -1-2j sort together).
I don't think it makes sense to bake into an unodered data type a single
way of ordering. If there was such a natural order, then the data type
wouldn't be unordered and you should just define the comparison
operators.
> hmm, perhaps a __key__ method could even be used by the comparison
> operators, though that could result in pretty weird results when comparing
> two different types.
If the comparison operators fell back to calling __key__ when __lt__ etc
aren't defined, that would effectively force unordered types like
complex to be ordered.
> So: has this already been brought up and rejected?
>
> Am I imagining the performance benefits?
Probably.
> Is sorting-related functionally too special-case to deserve a protocol?
Yes.
--
Steve
More information about the Python-ideas
mailing list