[Python-ideas] a sorting protocol dunder method?
barry at barrys-emacs.org
Wed Dec 6 14:47:22 EST 2017
> On 5 Dec 2017, at 01:06, Chris Barker <chris.barker at noaa.gov> wrote:
> wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-)
> This could be a good idea -- just putting it here for the record as it's mentioned elsewhere.
> I can't think of a way to profile this easily -- we know that having a key function can be helpful, but that doesn't take into account the extra method lookup -- maybe a key function that involves a method lookup??
> If it defines both, it isn't clear which will be used for sorting.
> Should __lt__ take priority, or __key__? Whichever we choose, somebody
> is going to be upset and confused by the choice.
> __sort_key__ would take priority -- that is a no brainer, it's the sort key, it's used for sorting. And __lt__ giving a different result is no more surprising, and probably less surprising, than total ordering being violated in any other way.
If by no brainer you mean the performance of __sort-key__ is always better of __lt__ then I will wask for a proof in the form of benchmarks with enough use-case coverage.
> [I got very similar results as Barry with his version: about 5X faster with the key function]
> def outer_key(item):
> return item.key()
> so we get a function lookup each time it's used.
> However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key.
> Did I make a mistake? is that lookup somehow cached?
> In : run sort_key_test.py
> key 0.012529s 10000 calls
> outer_key 0.012139s 10000 calls
> lt 0.048057s 119877 calls
> each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key.
> Also, I tried making a "simpler" __lt__ method:
> return (self.value1, self.value2) < (other.value1, other.value2)
> but it was bit slower than the previous one -- interesting.
This is more expensive to execute then my version for 2 reasons.
1) my __lt__ did not need to create any tuples.
2) my __lt__ can exit after only looking at the value1's
> Then I tried a simpler (but probably common) simple attribute sort:
> def __lt__(self, other):
> global lt_calls
> lt_calls += 1
> return self.value1 < other.value1
> def key(self):
> global key_calls
> key_calls += 1
> return self.value1
> And that results in about a 6X speedup
> In : run sort_key_test.py
> key 0.005157s 10000 calls
> outer_key 0.007000s 10000 calls
> lt 0.041454s 119877 calls
> time ratio: 5.922036784741144
> And, interestingly (t me!) there is even a performance gain for only a 10 item list! (1.5X or so, but still)
My guess is that this is because the __lt__ test on simple types is very fast in python.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas