[Python-ideas] a sorting protocol dunder method?

Steven D'Aprano steve at pearwood.info
Mon Dec 4 10:52:44 EST 2017


On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote:
> On Mon, 4 Dec 2017 23:16:11 +1100
> Steven D'Aprano <steve at pearwood.info> wrote:
> > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:
> > 
> > > There are definitely advantages.  Sorting calls __lt__ for each
> > > comparison (that is, O(n log n) times) while __key__ would only be
> > > called once per item at the start (that is, O(n) times).  
> > 
> > Passing a key function doesn't magically turn a O(n log n) comparison 
> > sort into a O(n) sort.
> 
> Where did I say it did?

See the text from you quoted above. You said there are "definitely 
[performance] advantages" by using a key function. You then compare:

- calling __lt__ O(n log n) times, versus

- calling the key function O(n) times.

This is a classic "apples versus oranges" comparison. You compare 
*actually sorting the list* with *not sorting the list* and conclude 
that they key function provides a performance advantage.

Yes, the key function gets called O(n) times. And that's not enough to 
sort the list, you still have to actually sort, exactly as I said. So 
where do these performance advantages come from?

As I said in another post, the overhead of decorating the list with the 
key function makes it rather unlikely that this will be faster than just 
sorting it. It can happen, if key(x).__lt__ is sufficiently faster than 
x.__lt__, but that would be unusual.


> > Once you've called the key function every time, you still have to 
> > *actually sort*, which will be up to O(n log n) calls to __lt__ on 
> > whatever __key__ returned.
> 
> Yes... and the whole point is for __key__ to return something which is
> very cheap to compare, such that there are O(n) expensive calls to
> __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n)
> expensive calls to __lt__.

Read Chris' post again. The point he was making is that the class might 
only define __lt__ in order to support sorting, and if we allow it to 
define a key function instead the class can avoid adding __lt__ at all. 
There's no requirement or expectation that __lt__ is expensive.

If you can order the values using a cheap method and an expensive 
method, why would you define __lt__ to use the expensive method instead 
of the cheap method? The point is to avoid defining __lt__ at all, and 
still support sorting.

But even if we define both... what makes you think that x.__lt__ is 
expensive (if it exists) and key(x).__lt__ is cheap? It might be the 
other way. If they are different, there's no guarantee about which is 
cheaper. If they are the same, then one is redundant.


> > > If __key__ is inconsistent with __lt__, it is the same error as making
> > > __lt__ inconsistent with __gt__.  
> > 
> > You seem to be assuming that __key__ is another way of spelling __lt__, 
> > rather than being a key function.
> 
> It isn't.

Right -- you've now clarified your position. Thank you. It wasn't clear 
from your earlier posts.


> > In any case, it isn't an error for __lt__ to be inconsistent with 
> > __gt__.
> [...]
> > Not all values have total ordering.
> 
> As usual you should try to understand what people are trying to say
> instead of imagining they are incompetent.

Instead of getting your nose out of joint and accusing me of "imagining 
[you] are incompetent", and assuming that I didn't "try to understand 
what people are trying to say", how about *you* do the same?

Don't assume I'm an idiot too stupid to understand your perfectly clear 
words, rather consider the possibility that maybe I'm reading and 
responding to you in good faith, but I'm not a mind-reader.

If you failed to get your message across, perhaps the fault lies in your 
post, not my reading comprehension. In any case, whoever is to blame for 
the misunderstanding, the only one who can do anything about it is the 
writer. The writer should take responsibility for not being clear 
enough, rather than blaming the reader.


[...]
> > Defining a single comparison method is not enough to define the rest. 
> 
> How about you stick to the discussion?  I'm talking about deriving
> comparison methods from __key__, not from another comparison method.
> Defining __key__ is definitely enough to define all comparison
> methods.

Here's a key function I've used:

def key(string):
    return string.strip().casefold()

Convert to a key method:

def __key__(self):
    return self.strip().casefold()


Is it your idea to define __lt__ and others like this?

def __lt__(self, other):
    # ignoring the possibility of returning NotImplemented
    return self.__key__() < self.__key__()


Fair enough. I'm not convinced that's going to offer definite 
performance advantages, but like the total_ordering decorator, 
presumably if we're using this, performance is secondary to convenience.

Nor do I think this decorator needs to take an implicit dunder method, 
when it can take an explicit key function.


-- 
Steve


More information about the Python-ideas mailing list