[Python-ideas] a sorting protocol dunder method?

Paul Moore p.f.moore at gmail.com
Mon Dec 4 07:19:07 EST 2017

On 4 December 2017 at 11:41, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:
>> I think this is an interesting idea, and I don't believe that either
>> performance or "sortable vs comparable" are very relevant.
> Performance is always relevant -- while performance shouldn't be the
> sole deciding factor, it should be a factor.
> And since the entire use-case for this is sorting versus comparison
> operators, I'm having trouble understanding why you think that sorting
> versus comparison operators is irrelevant.

I'm not completely clear on what the expectation is (in terms of
"sortable vs comparable") here. Clearly if a class has __lt__, it's
both sortable and comparable, and that's fine. If it doesn't have
__lt__, then the implication is that the class designer doesn't
believe it's reasonable for it to be ordered. That's what not having
comparison methods *means* (well, excepting the case that the designer
didn't think of it, which is probably the case for 99% of my classes

If we have a __key__ method on a class, then the following becomes true:

    * We can work out which of 2 instances is bigger/smaller using max/min.
    * We can compare two items by doing a sort.

So while the *intent* may not be to allow comparisons, that's what you've done.

As a result, I don't think it's an important consideration to worry
about classes that "should be sortable, but shouldn't be orderable".
That's basically a contradiction in terms, and will likely only come
up in corner cases where for technical reasons you may need your
instances to participate in sorting without raising exceptions, but
you don't consider them orderable (the NLTK example mentioned above).

Conversely, when sorting a key can provide significant performance
improvements. A single O(n) pass to compute keys, followed by O(n
log(n)) comparisons could be significantly faster, assuming comparing
keys is faster than extracting them from the object. So allowing
classes to define __key__ could be a performance win over a __lt__
defined as (effectively)

    def __lt__(self, other):
        return self.__key__() < other.__key__()

Overall, I don't see any problem with the idea, although it's not
something I've ever needed myself, and I doubt that in practice it
will make *that* much difference. The main practical benefit, I
suspect, would be if there were an "Orderable" ABC that auto-generated
the comparison methods given either __lt__ or __key__ (I could have
sworn there was such an ABC for __lt__ already, but I can't find it in
the library ref :-()


More information about the Python-ideas mailing list