On Sun, Dec 03, 2017 at 03:06:02PM -0800, Chris Barker wrote:
Recent python has moved toward a "key" function for customized sorting:
list.sort(key=key_fun)
key is also used (according to https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in:
min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby()
with this fairly broad use, it seems it's becoming a fairly universal protocol for ordering.
Be careful: there are two different concepts here, which are only loosely related: - ordering values, that is, whether or not we can say that x < y - sorting values in a collection. By default, we sort by the inherent order of the values. But if the values have no inherent order (they are unordered), we can sort unordered items in a collection by providing an appropriate key function. Hence why I say they are loosely related. For example, we can sort the normally unordered complex numbers by providing a key function: py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: (z.real, z.imag)) [1j, (1+8j), (3-2j), (5+2j)] But conceptually, I'm imposing an order on an otherwise unordered data type. Complex numbers inherently have no order: it makes no sense to say that 1+8j is less than 3-2j. But since the items in the list have to be in *some* one-dimensional order, I can choose whichever order makes sense for *this* collection: py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: abs(z)) [1j, (3-2j), (5+2j), (1+8j)] Another collection of the same values might be ordered differently. It doesn't make sense to put that functionality into the complex numbers themselves: complex numbers are unordered, and any order we impose on them comes from the collection, not the individual numbers.
However, if you are writing a custom class, and want to make it "sortable", you need to define (some of) the total comparison operators, which presumably are then called O(n logn) number of times for comparisons when sorting.
Or provide a sort key function when you actually do the sorting, which requires some inside knowledge of the objects you are sorting.
This is conflating the two distinct concepts: the comparison operators apply to the values in the collection; the key function applies to the collection itself (although it does need to have inside knowledge of the items in the collection).
But what if there was a sort key magic method:
__key__ or __sort_key__ (or whatever)
that would be called by the sorting functions if:
no key function was specified
and
it exists
It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case.
I'm not sure how you can talk about performance of the __key__ method without knowing the implementation :-) If this __key__ method is called like __lt__, then the big O() behaviour will be the same, worst case, O(n log n). If it is called like the key function, then the big O() behaviour will be the same as the key function now. Either way, you're just changing the name of the function called, not how it is called.
In fact, it's striking me that there may well be classes that are defining the comparison magic methods not because they want the objects to "work" with the comparison operators, but because that want them to work with sort and min, and max, and...
It is conceivable, I suppose, but if I were designing an unordered data type (like complex) I wouldn't implement ordering operators to allow sorting, I'd provide a separate key function. But that assumes that there's only ever one way to order a collection of unordered items. I don't think that's a safe assumption. Again, look at complex above, there are at least three obvious ways: - order by magnitude; - order by real part first, then imaginary part; - order by the absolute values of the real and imaginary parts (so that 1+2j and -1-2j sort together). I don't think it makes sense to bake into an unodered data type a single way of ordering. If there was such a natural order, then the data type wouldn't be unordered and you should just define the comparison operators.
hmm, perhaps a __key__ method could even be used by the comparison operators, though that could result in pretty weird results when comparing two different types.
If the comparison operators fell back to calling __key__ when __lt__ etc aren't defined, that would effectively force unordered types like complex to be ordered.
So: has this already been brought up and rejected?
Am I imagining the performance benefits?
Probably.
Is sorting-related functionally too special-case to deserve a protocol?
Yes. -- Steve