[Python-ideas] a sorting protocol dunder method?
Chris Barker - NOAA Federal
chris.barker at noaa.gov
Fri Dec 8 19:08:19 EST 2017
If by no brainer you mean the performance of __sort-key__ is always better
No. By no-brainer I meant that IF there is a __sort_key__ defined, then it
should be used for sorting, regardless of whether __lt__ is also defined.
(min and max should probably prefer __lt__)
I will wask for a proof in the form of benchmarks with enough use-case
Indeed — I was surprised that it helped so much, and didn’t seem to hurt
for the one example.
But the greater concern is that this will effect every sort (that doesn’t
provide a key function) so if there is any noticeable performance hit, that
probably kills the idea.
And the most risky example is lists of ints or floats — which are very fast
to compare. So adding a method lookup could be a big impact.
I’m not sure how to benchmark that without hacking the sorting C code
I’d still love to know if my benchmark attempt was at all correct for
custom classes in any case.
so we get a function lookup each time it's used.
However, I'm confused by the results -- essentially NO Change. That extra
method lookup is coming essentially for free. And this example is using a
tuple as the key, so not the very cheapest possible to sort key.
Did I make a mistake? is that lookup somehow cached?
In : run sort_key_test.py
key 0.012529s 10000 calls
outer_key 0.012139s 10000 calls
lt 0.048057s 119877 calls
each run gives different results, but the lt method is always on order of
5X slower for this size list. Sometimes out_key is faster, mostly a bit
slower, than key.
Also, I tried making a "simpler" __lt__ method:
return (self.value1, self.value2) < (other.value1, other.value2)
but it was bit slower than the previous one -- interesting.
This is more expensive to execute then my version for 2 reasons.
1) my __lt__ did not need to create any tuples.
2) my __lt__ can exit after only looking at the value1's
Then I tried a simpler (but probably common) simple attribute sort:
def __lt__(self, other):
lt_calls += 1
return self.value1 < other.value1
key_calls += 1
And that results in about a 6X speedup
In : run sort_key_test.py
key 0.005157s 10000 calls
outer_key 0.007000s 10000 calls
lt 0.041454s 119877 calls
time ratio: 5.922036784741144
And, interestingly (t me!) there is even a performance gain for only a 10
item list! (1.5X or so, but still)
My guess is that this is because the __lt__ test on simple types is very
fast in python.
Python-ideas mailing list
Python-ideas at python.org
Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas