Mailman 3 a sorting protocol dunder method? - Python-ideas

newer
Decorator for creating enumeration?

a sorting protocol dunder method?

Chris Barker

3 Dec 2017 3 Dec '17

11:06 p.m.

I can't believe this hasn't been brought up before, but searching the web, and python-ideas, and all the PEPs has found nothing (could be my lame google-fu), so here goes: Recent python has moved toward a "key" function for customized sorting: list.sort(key=key_fun) key is also used (according to https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in: min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby() with this fairly broad use, it seems it's becoming a fairly universal protocol for ordering. However, if you are writing a custom class, and want to make it "sortable", you need to define (some of) the total comparison operators, which presumably are then called O(n logn) number of times for comparisons when sorting. Or provide a sort key function when you actually do the sorting, which requires some inside knowledge of the objects you are sorting. But what if there was a sort key magic method: __key__ or __sort_key__ (or whatever) that would be called by the sorting functions if: no key function was specified and it exists It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case. In fact, it's striking me that there may well be classes that are defining the comparison magic methods not because they want the objects to "work" with the comparison operators, but because that want them to work with sort and min, and max, and... hmm, perhaps a __key__ method could even be used by the comparison operators, though that could result in pretty weird results when comparing two different types. So: has this already been brought up and rejected? Am I imagining the performance benefits? Is sorting-related functionally too special-case to deserve a protocol? Thoughts? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Attachments:

attachment.htm (text/html — 3.4 KB)

Show replies by date

Nathan Schneider

3 Dec 3 Dec

11:46 p.m.

On Sun, Dec 3, 2017 at 6:06 PM, Chris Barker wrote:

...

In fact, it's striking me that there may well be classes that are defining the comparison magic methods not because they want the objects to "work" with the comparison operators, but because that want them to work with sort and min, and max, and...

An existence proof: in NLTK, an __lt__ method added purely to facilitate consistent sorting (in doctests) of structured data objects for which comparison operators do not really make conceptual sense: https://github.com/nltk/nltk/pull/1902/files#diff-454368f06fd635b1e06c9bb6d6... Granted, calling min() and max() on collections of these objects would not make conceptual sense either. Still, __sort_key__ would have been cleaner than __lt__. Cheers, Nathan

Steven D'Aprano

4 Dec 4 Dec

12:57 a.m.

On Sun, Dec 03, 2017 at 06:46:45PM -0500, Nathan Schneider wrote:

...

On Sun, Dec 3, 2017 at 6:06 PM, Chris Barker wrote:

...
In fact, it's striking me that there may well be classes that are defining the comparison magic methods not because they want the objects to "work" with the comparison operators, but because that want them to work with sort and min, and max, and...

An existence proof: in NLTK, an __lt__ method added purely to facilitate consistent sorting (in doctests) of structured data objects for which comparison operators do not really make conceptual sense: https://github.com/nltk/nltk/pull/1902/files#diff-454368f06fd635b1e06c9bb6d6...

This shows the problem with putting the key function into the data type. What if I want to sort AttrDicts by their list of keys instead? Or their (key, value) pairs? What is so special about sorting by ID (which may not even exist!) that it deserves to be part of the AttrDict itself? The practical answer is that it is a convenience for doctests, but it would have been almost as convenient for nltk to provide a convenience sorting function to hold that knowledge as to bake it into the AttrDict itself. Or a key function that you can pass to sorted. My solution to this would have been to add a key function to the class: @staticmethod def _id_order(item): # Convenience function for doctests return item['ID'] and then sort like this: sorted(list_of_attrdicts, key=AttrDict._id_order) A little less convenient, but conceptually cleaner and more explicit. -- Steve

Bruce Leban

2:53 a.m.

On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...

However, if you are writing a custom class ... <snip>

But what if there was a sort key magic method:

__key__ or __sort_key__ (or whatever)

that would be called by the sorting functions <snip>

It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case.

On Sun, Dec 3, 2017 at 4:57 PM, Steven D'Aprano wrote:

...

This shows the problem with putting the key function into the data type. What if I want to sort AttrDicts by their list of keys instead? Or their (key, value) pairs? What is so special about sorting by ID (which may not even exist!) that it deserves to be part of the AttrDict itself?

I think you're arguing against this for the wrong reason. Chris was talking about custom classes having the *option* of making them sortable by providing a key method in the class definition. This strikes me as useful and I can imagine having used this if it were available. What you're saying is that there are classes which probably shouldn't define a __sort_key__ function, which I quite agree with. But I don't think it's a good argument against this proposal. On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...

Am I imagining the performance benefits?

Maybe. Looking strictly at O(-) cost, there's no difference between a key function and comparison operators. Sure it might potentially only make O(n) calls to the key function and O(n log n) calls to compare the keys vs. O(n log n) calls to the comparator functions but that might not actually be faster. There certainly are cases where implementing a key function would be quite slow. --- Bruce

David Mertz

3:14 a.m.

I'm not sure I understand the motivation to make elements *sortable* but not comparable. If an arbitrary order is still useful, I'd think you'd want to be able to tell how two particular elements *would* sort by asking a wrote:

...

On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...
However, if you are writing a custom class ... <snip>

But what if there was a sort key magic method:

__key__ or __sort_key__ (or whatever)

that would be called by the sorting functions <snip>

It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case.

On Sun, Dec 3, 2017 at 4:57 PM, Steven D'Aprano wrote:

...
This shows the problem with putting the key function into the data type. What if I want to sort AttrDicts by their list of keys instead? Or their (key, value) pairs? What is so special about sorting by ID (which may not even exist!) that it deserves to be part of the AttrDict itself?

I think you're arguing against this for the wrong reason. Chris was talking about custom classes having the *option* of making them sortable by providing a key method in the class definition. This strikes me as useful and I can imagine having used this if it were available. What you're saying is that there are classes which probably shouldn't define a __sort_key__ function, which I quite agree with. But I don't think it's a good argument against this proposal.

On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...
Am I imagining the performance benefits?

Maybe. Looking strictly at O(-) cost, there's no difference between a key function and comparison operators. Sure it might potentially only make O(n) calls to the key function and O(n log n) calls to compare the keys vs. O(n log n) calls to the comparator functions but that might not actually be faster. There certainly are cases where implementing a key function would be quite slow.

--- Bruce

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

David Mertz

3:21 a.m.

And if this is a method on a custom *collection*, it can do whatever it wants in MyCollection.sort() already. On Dec 3, 2017 7:14 PM, "David Mertz" wrote:

...

I'm not sure I understand the motivation to make elements *sortable* but not comparable. If an arbitrary order is still useful, I'd think you'd want to be able to tell how two particular elements *would* sort by asking a
On Dec 3, 2017 6:55 PM, "Bruce Leban" wrote:

...
On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...
However, if you are writing a custom class ... <snip>

But what if there was a sort key magic method:

__key__ or __sort_key__ (or whatever)

that would be called by the sorting functions <snip>

It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case.

On Sun, Dec 3, 2017 at 4:57 PM, Steven D'Aprano wrote:

...
This shows the problem with putting the key function into the data type. What if I want to sort AttrDicts by their list of keys instead? Or their (key, value) pairs? What is so special about sorting by ID (which may not even exist!) that it deserves to be part of the AttrDict itself?

I think you're arguing against this for the wrong reason. Chris was talking about custom classes having the *option* of making them sortable by providing a key method in the class definition. This strikes me as useful and I can imagine having used this if it were available. What you're saying is that there are classes which probably shouldn't define a __sort_key__ function, which I quite agree with. But I don't think it's a good argument against this proposal.

On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...
Am I imagining the performance benefits?

Maybe. Looking strictly at O(-) cost, there's no difference between a key function and comparison operators. Sure it might potentially only make O(n) calls to the key function and O(n log n) calls to compare the keys vs. O(n log n) calls to the comparator functions but that might not actually be faster. There certainly are cases where implementing a key function would be quite slow.

--- Bruce

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

1:16 p.m.

On Sun, Dec 03, 2017 at 06:53:45PM -0800, Bruce Leban wrote:

...

I think you're arguing against this for the wrong reason. Chris was talking about custom classes having the *option* of making them sortable by providing a key method in the class definition.

I never imagined that it would be a required method. Of course it is optional. But by adding interpreter support, we're blessing something which I think is a misguided design choice as the One Obvious Way. We're taking something which belongs in the report generator or collection, the knowledge of how to sort a collection of unordered values, and baking it into the values themselves. (Effectively making them ordered!) That's the wrong design. Your report needs to know about your values, your values shouldn't need to know how the report is formatted. Its like saying that you want an Email object, and a SMTP_Server object, but to make it more convenient for the SMTP_Server object, we should give the Email objects themselves a method that knows how to talk to port 25 and send themselves. Then the SMTP_Server just calls email.send() on each method. How convenient. The same applies here. Sure, it is convenient to just call bank_accounts.sort() (to re-use the example given by Carl) and it magically works, but as soon as your report changes and you want the bank accounts sorted according to their account name instead of balance, you have to either provide a key function, or change the __key__ method. Obviously changing the __key__ method will break any other reports that rely on it, so you end up using a key function anyway. I would mind this less if it isn't blessed by the interpreter. There are lots of classes which are excessively coupled to other things. I've written a few of them myself, so I understand the temptation. Sometimes that design might even be justified under "Practicality beats purity". But I don't think this is one of those cases: I don't see this as important enough or common enough to build it into the language as an actual dunder method. If people like Chris' idea, just add a sortkey() method to your class, and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will Just Work. It is explicit, backwards compatible and doesn't need to wait for Python 3.8 or whenever this (mis)feature gets (hypothetically) added.

...

On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote:

...
Am I imagining the performance benefits?

Maybe. Looking strictly at O(-) cost, there's no difference between a key function and comparison operators. Sure it might potentially only make O(n) calls to the key function and O(n log n) calls to compare the keys vs. O(n log n) calls to the comparator functions but that might not actually be faster.

It is unlikely that calling a key function followed by key comparisons would be faster than just calling the key comparisons. Using a key function is effectively the old DSU idiom: values = [(key(x), x) for x in values] # O(n) values.sort() # O(n log n) values = [t[1] for t in values] # O(n) so you make two extra passes through the list. The only way that could be faster is if key(x).__lt__ is sufficiently cheap compared to x.__lt__ that it saves more than the cost of those two extra passes (plus the overhead from dealing with the extra tuples). You might be thinking of the old Python 1 and early Python 2 cmp argument to sort. The comparator function can end up calling x.__lt__ up to O(n**2) times if I remember correctly, so it is quite expensive.

...

There certainly are cases where implementing a key function would be quite slow.

The biggest problem with a key function is that it trades off memory for time. If you are constrained by memory, but don't care how slow your sort is, an old-school comparison function might suit you better. But for those cases, functools.cmp_to_key may help. -- Steve

Julien Salort

9:54 p.m.

Le 04/12/2017 à 14:16, Steven D'Aprano a écrit :

...

We're taking something which belongs in the report generator or collection, the knowledge of how to sort a collection of unordered values, and baking it into the values themselves. (Effectively making them ordered!) It is also possible to use this __key__ method for classes for which the ordering is indeed unambiguously defined, e.g.:

class MyValue: def __init__(self, value, comment): self.value = value self.comment = comment def __key__(self): return self.value Then it is not shocking to define a sorting key.

Chris Angelico

10:03 p.m.

On Tue, Dec 5, 2017 at 8:54 AM, Julien Salort wrote:

...

Le 04/12/2017 à 14:16, Steven D'Aprano a écrit :

...
We're taking something which belongs in the report generator or collection, the knowledge of how to sort a collection of unordered values, and baking it into the values themselves. (Effectively making them ordered!)

It is also possible to use this __key__ method for classes for which the ordering is indeed unambiguously defined, e.g.:

class MyValue:

def __init__(self, value, comment): self.value = value self.comment = comment

def __key__(self): return self.value

Then it is not shocking to define a sorting key.

MyValue = namedtuple('MyValue', ['value', 'comment']) Job done :) ChrisA

Steven D'Aprano

12:34 a.m.

On Sun, Dec 03, 2017 at 03:06:02PM -0800, Chris Barker wrote:

...

Recent python has moved toward a "key" function for customized sorting:

list.sort(key=key_fun)

key is also used (according to https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in:

min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby()

with this fairly broad use, it seems it's becoming a fairly universal protocol for ordering.

Be careful: there are two different concepts here, which are only loosely related: - ordering values, that is, whether or not we can say that x < y - sorting values in a collection. By default, we sort by the inherent order of the values. But if the values have no inherent order (they are unordered), we can sort unordered items in a collection by providing an appropriate key function. Hence why I say they are loosely related. For example, we can sort the normally unordered complex numbers by providing a key function: py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: (z.real, z.imag)) [1j, (1+8j), (3-2j), (5+2j)] But conceptually, I'm imposing an order on an otherwise unordered data type. Complex numbers inherently have no order: it makes no sense to say that 1+8j is less than 3-2j. But since the items in the list have to be in *some* one-dimensional order, I can choose whichever order makes sense for *this* collection: py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: abs(z)) [1j, (3-2j), (5+2j), (1+8j)] Another collection of the same values might be ordered differently. It doesn't make sense to put that functionality into the complex numbers themselves: complex numbers are unordered, and any order we impose on them comes from the collection, not the individual numbers.

...

However, if you are writing a custom class, and want to make it "sortable", you need to define (some of) the total comparison operators, which presumably are then called O(n logn) number of times for comparisons when sorting.

Or provide a sort key function when you actually do the sorting, which requires some inside knowledge of the objects you are sorting.

This is conflating the two distinct concepts: the comparison operators apply to the values in the collection; the key function applies to the collection itself (although it does need to have inside knowledge of the items in the collection).

...

But what if there was a sort key magic method:

__key__ or __sort_key__ (or whatever)

that would be called by the sorting functions if:

no key function was specified

and

it exists

It seems this would provide a easy way to make custom classes sortable that would be nicer for end users (not writing key functions), and possibly more performant in the "usual" case.

I'm not sure how you can talk about performance of the __key__ method without knowing the implementation :-) If this __key__ method is called like __lt__, then the big O() behaviour will be the same, worst case, O(n log n). If it is called like the key function, then the big O() behaviour will be the same as the key function now. Either way, you're just changing the name of the function called, not how it is called.

...

In fact, it's striking me that there may well be classes that are defining the comparison magic methods not because they want the objects to "work" with the comparison operators, but because that want them to work with sort and min, and max, and...

It is conceivable, I suppose, but if I were designing an unordered data type (like complex) I wouldn't implement ordering operators to allow sorting, I'd provide a separate key function. But that assumes that there's only ever one way to order a collection of unordered items. I don't think that's a safe assumption. Again, look at complex above, there are at least three obvious ways: - order by magnitude; - order by real part first, then imaginary part; - order by the absolute values of the real and imaginary parts (so that 1+2j and -1-2j sort together). I don't think it makes sense to bake into an unodered data type a single way of ordering. If there was such a natural order, then the data type wouldn't be unordered and you should just define the comparison operators.

...

hmm, perhaps a __key__ method could even be used by the comparison operators, though that could result in pretty weird results when comparing two different types.

If the comparison operators fell back to calling __key__ when __lt__ etc aren't defined, that would effectively force unordered types like complex to be ordered.

...

So: has this already been brought up and rejected?

Am I imagining the performance benefits?

Probably.

...

Is sorting-related functionally too special-case to deserve a protocol?

Yes. -- Steve

Serhiy Storchaka

6:45 a.m.

04.12.17 01:06, Chris Barker пише:

...

So: has this already been brought up and rejected?

https://bugs.python.org/issue20632

...

Am I imagining the performance benefits?

This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item. And there will be an overhead even in the case when the __key__ method is not defined.

...

Is sorting-related functionally too special-case to deserve a protocol?

Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method. It will be rather confusing if different methods of sorting produce inconsistent order. If the first item of the sorted list is not the smallest item of the list. But the idea of the class decorator looks more sane to me.

Antoine Pitrou

11:06 a.m.

On Mon, 4 Dec 2017 08:45:55 +0200 Serhiy Storchaka wrote:

...

04.12.17 01:06, Chris Barker пише:

...
So: has this already been brought up and rejected?

https://bugs.python.org/issue20632

...
Am I imagining the performance benefits?

This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item.

That is a reasonable objection. However, looking up a tp_XXX slot is very cheap.

...

Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method.

There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

...

It will be rather confusing if different methods of sorting produce inconsistent order.

If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__. And there could be a decorator that generates all comparison methods from __key__. Regards Antoine.

Steven D'Aprano

12:16 p.m.

On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:

...

There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

Passing a key function doesn't magically turn a O(n log n) comparison sort into a O(n) sort. Once you've called the key function every time, you still have to *actually sort*, which will be up to O(n log n) calls to __lt__ on whatever __key__ returned. The key function is just a built-in version of the old "DSU" (Decorate-Sort-Undecorate) idiom: values = [(key(x), x) for x in values] values.sort() values = [t[1] for t in values] If you want this functionality, go right ahead and give your class a sortkey method, and then pass that as the explicit key function to sorted: sorted(collection_of_Spam, key=Spam.sortkey) That is nice and explicit, and the method only gets looked up once. It works in any version of Python, it is fully backwards-compatible, and it requires no support from the interpreter. I still think this is a poor object design, putting the key function in the object being sorted rather than in the report doing the sorting, but so long as this isn't blessed by the language as the One Obvious Way I don't mind so much.

...

...
It will be rather confusing if different methods of sorting produce inconsistent order.

If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__.

You seem to be assuming that __key__ is another way of spelling __lt__, rather than being a key function. If it does the same thing as __lt__, it is a comparison method, not a key function, and the name is horribly misleading. In any case, it isn't an error for __lt__ to be inconsistent with __gt__. py> {1, 2, 3, 4} < {2, 3, 4, 5} False py> {1, 2, 3, 4} > {2, 3, 4, 5} False Not all values have total ordering.

...

And there could be a decorator that generates all comparison methods from __key__.

Defining a single comparison method is not enough to define the rest. You need __eq__ and one comparison method. (Technically we could use __ne__ and one other, but __eq__ is usual. But why not just define __lt__ and use total_ordering, instead of defining two identical decorators that differ only in the name of the dunder method they use? The purpose of __key__ was supposed to be to eliminate the need to define __lt__. It seems ridiculous to then use __key__ to define __lt__ when the whole point of __key__ is to avoid needing to define __lt__. -- Steve

Antoine Pitrou

12:52 p.m.

On Mon, 4 Dec 2017 23:16:11 +1100 Steven D'Aprano wrote:

...

On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:

...
There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

Passing a key function doesn't magically turn a O(n log n) comparison sort into a O(n) sort.

Where did I say it did?

...

Once you've called the key function every time, you still have to *actually sort*, which will be up to O(n log n) calls to __lt__ on whatever __key__ returned.

Yes... and the whole point is for __key__ to return something which is very cheap to compare, such that there are O(n) expensive calls to __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n) expensive calls to __lt__.

...

...
If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__.

You seem to be assuming that __key__ is another way of spelling __lt__, rather than being a key function.

It isn't. It's just supposed to be consistent with it, just like __hash__ is supposed to be consistent with __eq__, and noone reasonable chastises Python because it allows to define __hash__ independently of __eq__. Also please note Serhiy's sentence I was responding to: """It will be rather confusing if different methods of sorting produce inconsistent order."""

...

In any case, it isn't an error for __lt__ to be inconsistent with __gt__. [...] Not all values have total ordering.

As usual you should try to understand what people are trying to say instead of imagining they are incompetent. In the present case, it would definitely be an inconsistency (and a programming error) to have both `a < b` and `b < a` return true.

...

Defining a single comparison method is not enough to define the rest.

How about you stick to the discussion? I'm talking about deriving comparison methods from __key__, not from another comparison method. Defining __key__ is definitely enough to define all comparison methods. Regards Antoine.

Steven D'Aprano

3:52 p.m.

On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote:

...

On Mon, 4 Dec 2017 23:16:11 +1100 Steven D'Aprano wrote:

...
On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:

...
There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

Passing a key function doesn't magically turn a O(n log n) comparison sort into a O(n) sort.

Where did I say it did?

See the text from you quoted above. You said there are "definitely [performance] advantages" by using a key function. You then compare: - calling __lt__ O(n log n) times, versus - calling the key function O(n) times. This is a classic "apples versus oranges" comparison. You compare *actually sorting the list* with *not sorting the list* and conclude that they key function provides a performance advantage. Yes, the key function gets called O(n) times. And that's not enough to sort the list, you still have to actually sort, exactly as I said. So where do these performance advantages come from? As I said in another post, the overhead of decorating the list with the key function makes it rather unlikely that this will be faster than just sorting it. It can happen, if key(x).__lt__ is sufficiently faster than x.__lt__, but that would be unusual.

...

...
Once you've called the key function every time, you still have to *actually sort*, which will be up to O(n log n) calls to __lt__ on whatever __key__ returned.

Yes... and the whole point is for __key__ to return something which is very cheap to compare, such that there are O(n) expensive calls to __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n) expensive calls to __lt__.

Read Chris' post again. The point he was making is that the class might only define __lt__ in order to support sorting, and if we allow it to define a key function instead the class can avoid adding __lt__ at all. There's no requirement or expectation that __lt__ is expensive. If you can order the values using a cheap method and an expensive method, why would you define __lt__ to use the expensive method instead of the cheap method? The point is to avoid defining __lt__ at all, and still support sorting. But even if we define both... what makes you think that x.__lt__ is expensive (if it exists) and key(x).__lt__ is cheap? It might be the other way. If they are different, there's no guarantee about which is cheaper. If they are the same, then one is redundant.

...

...
...
If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__.

You seem to be assuming that __key__ is another way of spelling __lt__, rather than being a key function.

It isn't.

Right -- you've now clarified your position. Thank you. It wasn't clear from your earlier posts.

...

...
In any case, it isn't an error for __lt__ to be inconsistent with __gt__. [...] Not all values have total ordering.

As usual you should try to understand what people are trying to say instead of imagining they are incompetent.

Instead of getting your nose out of joint and accusing me of "imagining [you] are incompetent", and assuming that I didn't "try to understand what people are trying to say", how about *you* do the same? Don't assume I'm an idiot too stupid to understand your perfectly clear words, rather consider the possibility that maybe I'm reading and responding to you in good faith, but I'm not a mind-reader. If you failed to get your message across, perhaps the fault lies in your post, not my reading comprehension. In any case, whoever is to blame for the misunderstanding, the only one who can do anything about it is the writer. The writer should take responsibility for not being clear enough, rather than blaming the reader. [...]

...

...
Defining a single comparison method is not enough to define the rest.

How about you stick to the discussion? I'm talking about deriving comparison methods from __key__, not from another comparison method. Defining __key__ is definitely enough to define all comparison methods.

Here's a key function I've used: def key(string): return string.strip().casefold() Convert to a key method: def __key__(self): return self.strip().casefold() Is it your idea to define __lt__ and others like this? def __lt__(self, other): # ignoring the possibility of returning NotImplemented return self.__key__() < self.__key__() Fair enough. I'm not convinced that's going to offer definite performance advantages, but like the total_ordering decorator, presumably if we're using this, performance is secondary to convenience. Nor do I think this decorator needs to take an implicit dunder method, when it can take an explicit key function. -- Steve

Antoine Pitrou

4:30 p.m.

New subject: (no subject)

On Tue, 5 Dec 2017 02:52:44 +1100 Steven D'Aprano wrote:

...

On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote:

...
On Mon, 4 Dec 2017 23:16:11 +1100 Steven D'Aprano wrote:

...
On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:

...
There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

Passing a key function doesn't magically turn a O(n log n) comparison sort into a O(n) sort.

Where did I say it did?

See the text from you quoted above. You said there are "definitely [performance] advantages" by using a key function. You then compare:

- calling __lt__ O(n log n) times, versus

- calling the key function O(n) times.

This is a classic "apples versus oranges" comparison. You compare *actually sorting the list* with *not sorting the list* and conclude that they key function provides a performance advantage.

At this point, I can only assume you are trolling by twisting my words... even though you later quote the part which explicitly clarifies that I was *not* saying what you claim I did. Why you seem to think that is contributing anything to the discussion rather than derailing it is beyond me. In any case, don't expect further responses from me. Regards Antoine.

brent bejot

6:01 p.m.

I'm +1 on this idea for the most part. I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so.

...

and then sort like this: sorted(list_of_attrdicts, key=AttrDict._id_order)

...

This will add an additional overhead. This will be even slower than

This is certainly a good pattern to use in the current and older versions, but I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty. passing the key function, since you will need to look up the __key__ method in every item. And there will be an overhead even in the case when the __key__ method is not defined. This, to me, is the only possible negative. I would be most interested to see how much of an effect this would have on real-world data that doesn't have __key__ defined. I may be new to this community but Steven D'Aprano and Antoine Pitrou, you guys bicker like my parents before they got a divorce. I'm pretty sure you're both veterans and so know how to behave yourselves. Please set the tone according to how you'd like us newbies to respond. -Brent

Ethan Furman

6:16 p.m.

On 12/04/2017 10:01 AM, brent bejot wrote:

...

This is certainly a good pattern to use in the current and older versions, but I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty.

Actually, no, we do not all agree. ;-) -- ~Ethan~

Greg Ewing

5 Dec 5 Dec

4:36 a.m.

...

On 12/04/2017 10:01 AM, brent bejot wrote:

...
I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty.

Just remember that too much syntactic sugar can give you cancer of the semicolon. -- Greg

Mark Lawrence

4 Dec 4 Dec

6:22 p.m.

On 04/12/17 18:01, brent bejot wrote:

...

I'm +1 on this idea for the most part.

I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so.

...
and then sort like this: sorted(list_of_attrdicts, key=AttrDict._id_order)

Isn't this exactly what the operator module's itemgetter and attrgetter all ready give you? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

Chris Barker

5 Dec 5 Dec

1:06 a.m.

wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-) Thanks for all the feedback. From what I can tell, we don't have a consensus, though It's looking pretty unlikely that this is going to be adopted (though maybe the decorator idea). But I'm going to go through and try to summarize what we (I?) have learned, and what does seem to be agreed upon, or not. If I misinterpret something you said, or take a quote out of context, please correct it, but trust that I didn't do it on purpose.... Also, it's kind of a pain to do a digest like this and properly attribute everyone, so mostly I'm not going to attribute the quotes... So: has this already been brought up and rejected?

...

https://bugs.python.org/issue20632 Thanks Serhiy -- I didn't think to look in the Bug DB. This does remind me that it would be good to have a place (maybe in a mets-PEP?) to put topics that have been discussed, but didn't get as far as anyone writing a PEP... An existence proof: in NLTK, an __lt__ method added purely to facilitate

...

consistent sorting (in doctests) of structured data objects for which comparison operators do not really make conceptual sense: https://github.com/nltk/nltk/pull/1902/files#diff- 454368f06fd635b1e06c9bb6d65bd19bR689 Granted, calling min() and max() on collections of these objects would not make conceptual sense either. Still, __sort_key__ would have been cleaner than __lt__.

So nice to know I'm not the only one that wants (needs) to provide a sort order be default -- though it doesn't mean it's not a niche issue anyway. By default, we sort by the inherent order of the values. But if the

...

values have no inherent order (they are unordered), we can sort unordered items in a collection by providing an appropriate key function. Hence why I say they are loosely related.

<snip>

...

It doesn't make sense to put that functionality into the complex numbers themselves: complex numbers are unordered, and any order we impose on them comes from the collection, not the individual numbers.

Sure -- and that's why we definitely still need a key option for the sorting functions, and why not all classes should define a sort order. But for many classes, it does make sense to define a default sort order. Take something as simple as strings -- they have a default sort order, but a user very well might want to sort them differently -- say case-insensitive, or ... So I think there is consensus here (and no one was proposing otherwise): - Defining a sort order is NOT required of all classes - The sorting methods definitely need to allow a custom key function. the comparison operators

...

apply to the values in the collection; the key function applies to the collection itself

But many (most) objects DO provide a default sort order, by means of the comparison methods. So the idea that objects have a default sort order is already pretty well accepted :-) But that assumes that there's only ever one way to order a collection of

...

unordered items.

no,. it doesn't -- it implies there is a default way, which is what is done already for most types. And again, some classes shouldn't even have a default. I'm not sure I understand the motivation to make elements *sortable* but

...

not comparable. If an arbitrary order is still useful, I'd think you'd want to be able to tell how two particular elements *would* sort by asking a

yeah, I'm not sure about this -- going back to the example of complex numbers, I like the idea of providing a default sort order but not make the declaration that this is how the objects SHOULD be compared -- after all, you can provide your own key function, but not so easily re-define the comparisons...

...

Is sorting-related functionally too special-case to deserve a protocol?

...

Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method. It will be rather confusing if different methods of sorting produce inconsistent order. If the first item of the sorted list is not the smallest item of the list.

well, yeah, but not any different than the fact that you can inconsitently define the comparison operators already. What I wasn't realizing is that you only need to define __lt__ (and __eq__) to get sortability. Maybe it would be good to document that more prominently somewhere (if it's documented at all, other than the source). But the idea of the class decorator looks more sane to me. yeah, I'm liking that. Most often when I define equality and comparison dunder methods for a

...

custom class, I'm effectively just deferring the comparison to some field or tuple of fields of the object.

Exactly -- which may be an argument for a decorator, rather than a dunder method -- particularly if performance isn't improved by the dunder method. What if we added a @key_ordering decorator, like @total_ordering but

...

using __key__ to generate the comparisons?

This could be a good idea -- just putting it here for the record as it's mentioned elsewhere. OK -- multiple posts about performance, I'm going to try to summarize:

...

This will add an additional overhead. This will be even slower than

...
passing the key function, since you will need to look up the __key__ method in every item. That is a reasonable objection. However, looking up a tp_XXX slot is very cheap. Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method. There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

OK -- so: if a new slot is added, then the performance hit of __key__ lookup is minor, but every type now has a new slot, so there is another performance (and memory) hit for that everywhere -- this is too niche to want to hit every type. Any chance that adding a __key__ slot would help with other built-in types by being faster than checking __lt__ -- probably not. Conversely, when sorting a key can provide significant performance

...

improvements. A single O(n) pass to compute keys, followed by O(n log(n)) comparisons could be significantly faster,

snip

...

It can happen, if key(x).__lt__ is sufficiently faster than x.__lt__, but that would be unusual.

actually, that's quite common :-) -- but a __sort_key__ method would add n extra attribute lookups, too. so now it's less likely that it'll be faster than __lt__ This is "key" :-) -- it is very common for the sort key to be a very simple type, say int or float, that are very fast to compare -- so this can be a really performance saver when passing key function. Also, I recall that someone was working on special-casing the sort code to do even faster sorting on simple numeric keys -- not sure what came of that though. But the extra method lookup for every key could overwhelm that anyway :-(

...

And there will be an overhead even in the case when the __key__ method is not defined.

not good. I can't think of a way to profile this easily -- we know that having a key function can be helpful, but that doesn't take into account the extra method lookup -- maybe a key function that involves a method lookup?? If it defines both, it isn't clear which will be used for sorting.

...

Should __lt__ take priority, or __key__? Whichever we choose, somebody is going to be upset and confused by the choice.

__sort_key__ would take priority -- that is a no brainer, it's the sort key, it's used for sorting. And __lt__ giving a different result is no more surprising, and probably less surprising, than total ordering being violated in any other way.

...

I disagree with the design of this: it is putting the decision of how to sort uncomparable objects in the wrong place, in the object itself rather than in the report that wants to sort them.

No -- it's putting the decision about a default sort order in the object itself -- maybe a niche case, but a real one. If we have a __key__ method on a class, then the following becomes true:

...

* We can work out which of 2 instances is bigger/smaller using max/min. * We can compare two items by doing a sort. So while the *intent* may not be to allow comparisons, that's what you've done.

Sure, but Python is for consenting adults -- we allow a lot of things that aren't encouraged... I don't think leveraging sort to get a comparison is a problematic use case at all -- though "min" and "max" is a different matter ... maybe they shouldn't use __sort_key__? thought that does weaken the whole idea :-) But why not just define __lt__ and use total_ordering, instead of

...

defining two identical decorators that differ only in the name of the dunder method they use?

well, one reason is that total_ordering can result in even worse performance... thought probably not a huge deal. I never imagined that it would be a required method. Of course it is

...

optional. But by adding interpreter support, we're blessing something which I think is a misguided design choice as the One Obvious Way.

I don't think it's interpreter support, but rather standard library support -- it would be changes in the list.sort and sorted, and ... functions, not any change to the interpreter. (and it would be a change to the language definition) But your point is that same. However, I think there is no consensus that it's a misguided design choice -- having an explicit way to specifically say "this is the default way to sort these objects", when there is not other reason for total ordering feels pythonic to me. Again, maybe a niche use case though. If people like Chris' idea, just add a sortkey() method to your class,

...

and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will Just Work.

Not a bad approach, though that as useful until/unless it becomes something of a standard. (and for the record, if we did add __sort_key__, then on older versions you could still do: sorted(collection_of_Spam, key=Spam.__sort_key__) so too bad on the compatibility front... The decorator idea LGTM. But it doesn't need the special __key__ method.

...

Just pass the key function as a decorator argument. This idea was proposed more than 3.5 years ago. Is there a PyPI package that implements it?

yes, it was, though not in a terribly public place / way... I would find it cleaner to express it as a method in the class's body

...

("__key__" or anything else) rather than have to pass a function object. Also, it's easier to unit-test if it officially exists as a method...

me too, though couldn't we do both? the decorator, if not passed a sort function, would look for __key__. I wondered what the performance would be and tested the following code: <snip> it outputs this for my with python 3.6.0

...

10000 key 0.010628s 10000 calls lt 0.053690s 119886 calls It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key).

yeah, this is what I expect for a key function -- exactly why I started all this expecting a performance gain. But as others' have pointed out, this is a key function, not a __key__ method that would need to be called each time. I'm going to try to benchmark (a simulation of) that. This is Barry's code, with the addition of a "outer_key" function that call the instances key method: [I got very similar results as Barry with his version: about 5X faster with the key function] def outer_key(item): return item.key() so we get a function lookup each time it's used. However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key. Did I make a mistake? is that lookup somehow cached? In [36]: run sort_key_test.py 10000 key 0.012529s 10000 calls outer_key 0.012139s 10000 calls lt 0.048057s 119877 calls each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key. Also, I tried making a "simpler" __lt__ method: return (self.value1, self.value2) < (other.value1, other.value2) but it was bit slower than the previous one -- interesting. Then I tried a simpler (but probably common) simple attribute sort: def __lt__(self, other): global lt_calls lt_calls += 1 return self.value1 < other.value1 def key(self): global key_calls key_calls += 1 return self.value1 And that results in about a 6X speedup In [42]: run sort_key_test.py 10000 key 0.005157s 10000 calls outer_key 0.007000s 10000 calls lt 0.041454s 119877 calls time ratio: 5.922036784741144 And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still) In fact, this seems to show that having a __key__ method would often provide a performance boost, even for small lists. (still to try to figure out -- how much would the look for __key__ slow things down when it didn't exist....) I have got to be doing something wrong here -- I hope some of you smarter folks will tell me what :-) Code enclosed. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Barry Scott

6 Dec 6 Dec

7:47 p.m.

...

On 5 Dec 2017, at 01:06, Chris Barker wrote:

wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-)

This could be a good idea -- just putting it here for the record as it's mentioned elsewhere.

I can't think of a way to profile this easily -- we know that having a key function can be helpful, but that doesn't take into account the extra method lookup -- maybe a key function that involves a method lookup??

If it defines both, it isn't clear which will be used for sorting. Should __lt__ take priority, or __key__? Whichever we choose, somebody is going to be upset and confused by the choice.

__sort_key__ would take priority -- that is a no brainer, it's the sort key, it's used for sorting. And __lt__ giving a different result is no more surprising, and probably less surprising, than total ordering being violated in any other way.

If by no brainer you mean the performance of __sort-key__ is always better of __lt__ then I will wask for a proof in the form of benchmarks with enough use-case coverage.

...

[I got very similar results as Barry with his version: about 5X faster with the key function]

def outer_key(item): return item.key()

so we get a function lookup each time it's used.

However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key.

Did I make a mistake? is that lookup somehow cached?

In [36]: run sort_key_test.py 10000 key 0.012529s 10000 calls outer_key 0.012139s 10000 calls lt 0.048057s 119877 calls

each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key.

Also, I tried making a "simpler" __lt__ method:

return (self.value1, self.value2) < (other.value1, other.value2)

but it was bit slower than the previous one -- interesting.

This is more expensive to execute then my version for 2 reasons. 1) my __lt__ did not need to create any tuples. 2) my __lt__ can exit after only looking at the value1's

...

Then I tried a simpler (but probably common) simple attribute sort:

def __lt__(self, other): global lt_calls lt_calls += 1

return self.value1 < other.value1

def key(self): global key_calls key_calls += 1

return self.value1

And that results in about a 6X speedup

In [42]: run sort_key_test.py 10000 key 0.005157s 10000 calls outer_key 0.007000s 10000 calls lt 0.041454s 119877 calls time ratio: 5.922036784741144

And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still)

My guess is that this is because the __lt__ test on simple types is very fast in python. Barry

Chris Barker - NOAA Federal

9 Dec 9 Dec

12:08 a.m.

If by no brainer you mean the performance of __sort-key__ is always better of __lt__ No. By no-brainer I meant that IF there is a __sort_key__ defined, then it should be used for sorting, regardless of whether __lt__ is also defined. (min and max should probably prefer __lt__) I will wask for a proof in the form of benchmarks with enough use-case coverage. Indeed — I was surprised that it helped so much, and didn’t seem to hurt for the one example. But the greater concern is that this will effect every sort (that doesn’t provide a key function) so if there is any noticeable performance hit, that probably kills the idea. And the most risky example is lists of ints or floats — which are very fast to compare. So adding a method lookup could be a big impact. I’m not sure how to benchmark that without hacking the sorting C code though. I’d still love to know if my benchmark attempt was at all correct for custom classes in any case. -CHB def outer_key(item): return item.key() so we get a function lookup each time it's used. However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key. Did I make a mistake? is that lookup somehow cached? In [36]: run sort_key_test.py 10000 key 0.012529s 10000 calls outer_key 0.012139s 10000 calls lt 0.048057s 119877 calls each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key. Also, I tried making a "simpler" __lt__ method: return (self.value1, self.value2) < (other.value1, other.value2) but it was bit slower than the previous one -- interesting. This is more expensive to execute then my version for 2 reasons. 1) my __lt__ did not need to create any tuples. 2) my __lt__ can exit after only looking at the value1's Then I tried a simpler (but probably common) simple attribute sort: def __lt__(self, other): global lt_calls lt_calls += 1 return self.value1 < other.value1 def key(self): global key_calls key_calls += 1 return self.value1 And that results in about a 6X speedup In [42]: run sort_key_test.py 10000 key 0.005157s 10000 calls outer_key 0.007000s 10000 calls lt 0.041454s 119877 calls time ratio: 5.922036784741144 And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still) My guess is that this is because the __lt__ test on simple types is very fast in python. Barry _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Rhodri James

4 Dec 4 Dec

7:51 p.m.

On 04/12/17 18:01, brent bejot wrote:

...

I'm +1 on this idea for the most part.

I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so.

This I'm absolutely fine with. Key methods are something to encourage. The problem that I have is that once you get beyond simple lists of number or strings, there often isn't a particularly obvious sort order, or rather there are often multiple obvious sort orders and you may want any of them. In fact I'd go so far as to suggest that there _usually_ isn't a single obvious sort order for non-trivial classes. To take a non-Python example, I'm in charge of the reading rota at my local church, and I keep a spreadsheet of readers to help me. Usually I sort that list by the date people last read the lesson, so I can quickly tell who I should ask next. When the newsletter asks me for a list of readers, though, I sort them alphabetically by surname, which most people would think of as the natural sorting order. -- Rhodri James *-* Kynesim Ltd

Serhiy Storchaka

1:50 p.m.

04.12.17 13:06, Antoine Pitrou пише:

...

On Mon, 4 Dec 2017 08:45:55 +0200 Serhiy Storchaka wrote:

...
04.12.17 01:06, Chris Barker пише:

...
So: has this already been brought up and rejected?

https://bugs.python.org/issue20632

...
Am I imagining the performance benefits?

This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item.

That is a reasonable objection. However, looking up a tp_XXX slot is very cheap.

But introducing a new slot is not easy. This will increase the size of the type object, break binary compatibility. According to Stefan Behnel's researches (https://bugs.python.org/issue31336) the main time of the creation of a new type is spent on initializing slots. This cost will pay every Python program, even if it doesn't use the __key__ method. There are many more common and important methods that don't have the corresponding slot (__length_hint__, __getstate__, __reduce__, __copy__, keys).

...

...
Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method.

There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

It will call __lt__ for each key comparison (the same O(n log n) times), but *in addition* it will call __key__ O(n) times. You can get the benefit only when times of calling __key__ is much smaller than the difference between times of calling item's __lt__ and key's __lt__, and maybe only for large lists. But why not just pass the key argument when you sort the large list?

...

...
It will be rather confusing if different methods of sorting produce inconsistent order.

If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__.

If __key__ is consistent with __lt__, then we can just use __lt__, and don't introduce new special methods.

...

And there could be a decorator that generates all comparison methods from __key__.

The decorator idea LGTM. But it doesn't need the special __key__ method. Just pass the key function as a decorator argument. This idea was proposed more than 3.5 years ago. Is there a PyPI package that implements it?

Antoine Pitrou

2:10 p.m.

On Mon, 4 Dec 2017 15:50:31 +0200 Serhiy Storchaka wrote:

...

...
...
Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method.

There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times).

It will call __lt__ for each key comparison (the same O(n log n) times), but *in addition* it will call __key__ O(n) times. You can get the benefit only when times of calling __key__ is much smaller than the difference between times of calling item's __lt__ and key's __lt__, and maybe only for large lists.

Sure, that's the point: your non-trivial __key__ method reduces the instance to e.g. a simple tuple or string, and then __lt__ over those keys is cheap.

...

But why not just pass the key argument when you sort the large list?

For the same reason that you want __lt__ (or __eq__, or __hash__) to be defined on the type, not call it manually every time you want to make a comparison: because it's really a fundamental property of the type and it "feels" wrong to have to pass it explicitly. Also there are library routines which may sort implicitly their inputs (such as pprint, IIRC, though perhaps pprint only sorts after calling str() -- I haven't checked).

...

If __key__ is consistent with __lt__, then we can just use __lt__, and don't introduce new special methods.

This is ignoring all the other arguments...

...

The decorator idea LGTM. But it doesn't need the special __key__ method. Just pass the key function as a decorator argument.

I would find it cleaner to express it as a method in the class's body ("__key__" or anything else) rather than have to pass a function object. Also, it's easier to unit-test if it officially exists as a method...

...

This idea was proposed more than 3.5 years ago. Is there a PyPI package that implements it?

I don't know. I know I reimplemented such a thing for Numba (but of course didn't benefit from automatic sort() support), because I needed fast hashing and equality without implementing the corresponding methods by hand every time (*). It would be a pity to depend on a third-party package just for that. (*) see https://github.com/numba/numba/blob/master/numba/types/abstract.py#L88 Regards Antoine.

Barry Scott

7:37 p.m.

I wondered what the performance would be and tested the following code: #!/usr/bin/env python3 import random import time random.seed( hash('Testing Keys') ) lt_calls = 0 key_calls = 0 class MyObject: def __init__( self, value1, value2 ): self.value1 = value1 self.value2 = value2 def __lt__(self, other): global lt_calls lt_calls +=1 if self.value1 < other.value1: return True else: return self.value2 < other.value2 def key(self): global key_calls key_calls +=1 return self.value1, self.value2 lt_list = [] random for value1 in reversed(range(10000)): value2 = value1 - 50 lt_list.append( MyObject( value1, value2 ) ) random.shuffle( lt_list ) key_list = lt_list[:] print( len(lt_list) ) s = time.time() key_list.sort( key=MyObject.key ) e = time.time() - s print( 'key %.6fs %6d calls' % (e, key_calls) ) s = time.time() lt_list.sort() e = time.time() - s print( ' lt %.6fs %6d calls' % (e, lt_calls) ) it outputs this for my with python 3.6.0 10000 key 0.010628s 10000 calls lt 0.053690s 119886 calls It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key). I often have more then one way to sort an object. Its easy for me to provide a set of key functions that meet the needs of each sort context. I'm not sure what extra value the __sort_key__ would offer over providing a key method as I did. Barry

Antoine Pitrou

8:12 p.m.

On Mon, 4 Dec 2017 19:37:02 +0000 Barry Scott wrote:

...

I wondered what the performance would be and tested the following code:

[...]

...

it outputs this for my with python 3.6.0

10000 key 0.010628s 10000 calls lt 0.053690s 119886 calls

It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key).

Thanks for taking the time to write a benchmark. I'm not surprised by the results (and your __lt__ method isn't even complicated: the gap could be very much wider). There is more to Python performance than aggregate big-O algorithmic complexity. Regards Antoine.

Barry

9:22 p.m.

...

On 4 Dec 2017, at 20:12, Antoine Pitrou wrote:

On Mon, 4 Dec 2017 19:37:02 +0000 Barry Scott wrote:

...
I wondered what the performance would be and tested the following code:

[...]

...
it outputs this for my with python 3.6.0

10000 key 0.010628s 10000 calls lt 0.053690s 119886 calls

It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key).

Thanks for taking the time to write a benchmark. I'm not surprised by the results (and your __lt__ method isn't even complicated: the gap could be very much wider). There is more to Python performance than aggregate big-O algorithmic complexity.

I was surprised by the huge difference. I was expecting a closer race. For the record I think that a __sort_key__ is not a good idea as it is so easy to do as I did and define key methods on the class, without the limit of one such method. Barry

...

Regards

Antoine.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

9:55 p.m.

On Tue, Dec 5, 2017 at 8:22 AM, Barry wrote:

...

...
On 4 Dec 2017, at 20:12, Antoine Pitrou wrote:

On Mon, 4 Dec 2017 19:37:02 +0000 Barry Scott wrote:

...
I wondered what the performance would be and tested the following code:

[...]

...
it outputs this for my with python 3.6.0

10000 key 0.010628s 10000 calls lt 0.053690s 119886 calls

It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key).

Thanks for taking the time to write a benchmark. I'm not surprised by the results (and your __lt__ method isn't even complicated: the gap could be very much wider). There is more to Python performance than aggregate big-O algorithmic complexity.

I was surprised by the huge difference. I was expecting a closer race.

For the record I think that a __sort_key__ is not a good idea as it is so easy to do as I did and define key methods on the class, without the limit of one such method.

The numbers here are all fairly small, but they make a lot of sense. Calls into Python code are potentially quite slow, so there's a LOT of benefit to be had by calling Python code once per object instead of N log N times for the comparisons. Increasing the length of the list will make that difference even more pronounced. But that's an argument for using a key function, not for having a __key__ special method. ChrisA

Steven D'Aprano

12:25 p.m.

On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote:

...

But the idea of the class decorator looks more sane to me.

The purpose of __key__ is to define a key function (not a comparison operator) for classes that aren't orderable and don't have __lt__. If you're going to then go ahead and define __lt__ and the other comparison operators, there's no point to __key__. -- Steve

Serhiy Storchaka

2 p.m.

04.12.17 14:25, Steven D'Aprano пише:

...

On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote:

...
But the idea of the class decorator looks more sane to me.

The purpose of __key__ is to define a key function (not a comparison operator) for classes that aren't orderable and don't have __lt__.

If you're going to then go ahead and define __lt__ and the other comparison operators, there's no point to __key__.

Right. The only benefit of this decorator is that it could avoid writing a boilerplate code for simple cases. Just add @ordered_by_key(attrgetter('name', 'weight')). __key__ is not needed, just pass the key function as an argument of the decorator. Of course if it can be useful this doesn't mean that it should be included in the stdlib. It could live on PyPI.

Carl Meyer

6:48 a.m.

I think this is an interesting idea, and I don't believe that either performance or "sortable vs comparable" are very relevant. I doubt there is much performance to gain here, and I think the default sort order for a class must continue to match its comparison behavior. I think the case in favor of this idea (slightly modified, so it no longer applies only to sorting) is mostly convenience and readability. Most often when I define equality and comparison dunder methods for a custom class, I'm effectively just deferring the comparison to some field or tuple of fields of the object. E.g. from functools import total_ordering @total_ordering class BankAccount: def __init__(self, balance): self.balance = balance def __eq__(self, other): if isinstance(other, BankAccount): return self.balance == other.balance return NotImplemented def __lt__(self, other): if isinstance(other, BankAccount): return self.balance < other.balance return NotImplemented It'd be nice to be able to eliminate an import and have the lines of code and instead write that as: class BankAccount: def __init__(self, balance): self.balance = balance def __sort_key__(self): return self.balance I would expect these two to give the same behavior: instances of BankAccount should still be fully comparable and sortable, with all of these operations effectively being deferred to comparisons and sorts of the sort key. Now for the cases against: 1. I made one important decision explicitly (twice, unfortunately) in the first code block that disappeared in the second: what "other" instances should be considered comparable to instances of BankAccount? Should it be decided structurally, like in the documentation example for `functools.total_ordering`? Should it be "any subclass of BankAccount"? Or maybe it should only be instances of BankAccount itself, not subclasses? (We just went around on this very question for PEP 557, dataclasses.) If Python added __sort_key__, it would have to just pick a behavior here, which would be unfortunate for cases where that behavior is wrong. Or maybe we could also add a __sort_allowed__ method... 2. There might actually be a performance cost here, since this wouldn't replace the existing rich comparison dunder methods, so it would add one more thing Python has to check when trying to compare two objects. Carl

Nathaniel Smith

9:16 a.m.

On Sun, Dec 3, 2017 at 10:48 PM, Carl Meyer wrote:

...

It'd be nice to be able to eliminate an import and have the lines of code and instead write that as:

class BankAccount: def __init__(self, balance): self.balance = balance

def __sort_key__(self): return self.balance

What if we added a @key_ordering decorator, like @total_ordering but using __key__ to generate the comparisons? I know you'd have to do an import, but usually adding things to the core language requires more of a benefit than that :-). -n -- Nathaniel J. Smith -- https://vorpus.org

Steven D'Aprano

11:41 a.m.

On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:

...

I think this is an interesting idea, and I don't believe that either performance or "sortable vs comparable" are very relevant.

Performance is always relevant -- while performance shouldn't be the sole deciding factor, it should be a factor. And since the entire use-case for this is sorting versus comparison operators, I'm having trouble understanding why you think that sorting versus comparison operators is irrelevant.

...

I doubt there is much performance to gain here,

I doubt there is any performance gain -- rather a performance hit is far more likely.

...

and I think the default sort order for a class must continue to match its comparison behavior.

This proposal changes that: if a class defines __key__ but not __lt__, then the default sort behaviour will be different from the comparison behaviour. If it defines both, it isn't clear which will be used for sorting. Should __lt__ take priority, or __key__? Whichever we choose, somebody is going to be upset and confused by the choice.

...

Most often when I define equality and comparison dunder methods for a custom class, I'm effectively just deferring the comparison to some field or tuple of fields of the object. E.g.

from functools import total_ordering

@total_ordering class BankAccount: def __init__(self, balance): self.balance = balance

[snip example] This example shows exactly the confusion of concepts I'm talking about. Why should bank accounts be sorted by their balance, instead of by their account number, or account name, or date they were opened? Why should BankAccounts sort differently according to their balance, a quantity which can change after every transaction? What happens when somebody decides that bank accounts default sorting should be by their account name rather than the ever-changing balance? I'm not saying that it never makes sense to sort a bunch of accounts according to their balance. I'm saying that functionality is not part of the account themselves. If it belongs anywhere, it belongs in the collection of accounts. And even that is dubious: I believe that where it really belongs is in the report generator that needs to sort the collection of accounts.

...

It'd be nice to be able to eliminate an import

That's an argument that applies to *literally* everything in the standard library. Should we make everything a built-in? The prerequisites for eliminating the need for an import should be a *lot* higher than just "it would be nice". I disagree with the design of this: it is putting the decision of how to sort uncomparable objects in the wrong place, in the object itself rather than in the report that wants to sort them. -- Steve

Paul Moore

12:19 p.m.

On 4 December 2017 at 11:41, Steven D'Aprano wrote:

...

On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:

...
I think this is an interesting idea, and I don't believe that either performance or "sortable vs comparable" are very relevant.

Performance is always relevant -- while performance shouldn't be the sole deciding factor, it should be a factor.

And since the entire use-case for this is sorting versus comparison operators, I'm having trouble understanding why you think that sorting versus comparison operators is irrelevant.

I'm not completely clear on what the expectation is (in terms of "sortable vs comparable") here. Clearly if a class has __lt__, it's both sortable and comparable, and that's fine. If it doesn't have __lt__, then the implication is that the class designer doesn't believe it's reasonable for it to be ordered. That's what not having comparison methods *means* (well, excepting the case that the designer didn't think of it, which is probably the case for 99% of my classes ;-)) If we have a __key__ method on a class, then the following becomes true: * We can work out which of 2 instances is bigger/smaller using max/min. * We can compare two items by doing a sort. So while the *intent* may not be to allow comparisons, that's what you've done. As a result, I don't think it's an important consideration to worry about classes that "should be sortable, but shouldn't be orderable". That's basically a contradiction in terms, and will likely only come up in corner cases where for technical reasons you may need your instances to participate in sorting without raising exceptions, but you don't consider them orderable (the NLTK example mentioned above). Conversely, when sorting a key can provide significant performance improvements. A single O(n) pass to compute keys, followed by O(n log(n)) comparisons could be significantly faster, assuming comparing keys is faster than extracting them from the object. So allowing classes to define __key__ could be a performance win over a __lt__ defined as (effectively) def __lt__(self, other): return self.__key__() < other.__key__() Overall, I don't see any problem with the idea, although it's not something I've ever needed myself, and I doubt that in practice it will make *that* much difference. The main practical benefit, I suspect, would be if there were an "Orderable" ABC that auto-generated the comparison methods given either __lt__ or __key__ (I could have sworn there was such an ABC for __lt__ already, but I can't find it in the library ref :-() Paul

Antoine Pitrou

12:56 p.m.

On Mon, 4 Dec 2017 12:19:07 +0000 Paul Moore wrote:

...

On 4 December 2017 at 11:41, Steven D'Aprano wrote:

...
On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:

...
I think this is an interesting idea, and I don't believe that either performance or "sortable vs comparable" are very relevant.

Performance is always relevant -- while performance shouldn't be the sole deciding factor, it should be a factor.

And since the entire use-case for this is sorting versus comparison operators, I'm having trouble understanding why you think that sorting versus comparison operators is irrelevant.

I'm not completely clear on what the expectation is (in terms of "sortable vs comparable") here.

It's quite clear if you read what Chris Barker posted originally (and which I agree with). We're talking about deriving comparison methods from a key function *while* making sorting potentially faster, because the costly reduction operation happens O(n) times instead of O(n log n) times. Steven OTOH seems to be inventing controversies just for the sake of posting a rant. Regards Antoine.

Antoine Pitrou

11:02 a.m.

On Sun, 3 Dec 2017 15:06:02 -0800 Chris Barker wrote:

...

I can't believe this hasn't been brought up before, but searching the web, and python-ideas, and all the PEPs has found nothing (could be my lame google-fu), so here goes:

Recent python has moved toward a "key" function for customized sorting:

list.sort(key=key_fun)

key is also used (according to https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in:

min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby()

with this fairly broad use, it seems it's becoming a fairly universal protocol for ordering.

[...] +1 from me. I would also be +1 on an optional class decorator that would generate all the ordering comparison methods (__lt__, etc.) based on the __key__ method definition. Regards Antoine.

2324

Age (days ago)

2330

Last active (days ago)

List overview

Download

37 comments

20 participants

participants (20)

Antoine Pitrou
Barry
Barry Scott
brent bejot
Bruce Leban
Carl Meyer
Chris Angelico
Chris Barker
Chris Barker - NOAA Federal
David Mertz
Ethan Furman
Greg Ewing
Julien Salort
Mark Lawrence
Nathan Schneider
Nathaniel Smith
Paul Moore
Rhodri James
Serhiy Storchaka
Steven D'Aprano

a sorting protocol dunder method?

Mark Lawrence

tags

participants (20)