[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Elliot Gorokhovsky elliot.gorokhovsky at gmail.com
Sun Mar 5 16:19:32 EST 2017


On Sun, Mar 5, 2017 at 10:51 AM David Mertz <mertz at gnosis.cx> wrote:

> Thanks for the hard work, this looks very promising.
>

Thank you!


> Real world data tends to be mostly-sorted.  So it would be useful for your
> benchmarks to include:
>
> A) Performance on completely sorted data
>   i) Of homogenous type
>   ii) Of mixed type
> B) Performance on mostly-sorted data
>   i) Of homogenous type
>   ii) Of mixed type
>

You are entirely correct, as the benchmarks below demonstrate. I used the
benchmark lists from Objects/listsort.txt, which are:

      \sort: descending data
      /sort: ascending data
      3sort: ascending, then 3 random exchanges
      +sort: ascending, then 10 random at the end
      %sort: ascending, then randomly replace 1% of elements w/ random
values
      ~sort: many duplicates
      =sort: all equal

My results are below (the script can be found at
https://github.com/embg/python-fastsort-benchmark/blob/master/bench-structured.py
):
Homogeneous ([int]):
\sort: 54.6%
/sort: 56.5%
3sort: 53.5%
+sort: 55.3%
%sort: 52.4%
~sort: 48.0%
=sort: 45.2%

Heterogeneous ([int]*n + [0.0]):
\sort: -17.2%
/sort: -19.8%
3sort: -18.0%
+sort: -18.8%
%sort: -10.0%
~sort: -2.1%
=sort: -21.3%

As you can see, because there's a lot less non-comparison overhead in the
structured lists, the impact of the optimization is much greater, both in
performance gain and in worst-case cost. However, I would argue that these
data do not invalidate the utility of my patch: the probability of
encountering a type-heterogeneous list is certainly less than 5% in
practice. So the expected savings, even for structured lists, is something
like (5%)(-20%) + (95%)(50%) = 46.5%. And, of course, not *all* the lists
one encounters in practice are structured; certainly not *this* structured.
So, overall, I would say the numbers above are extremely encouraging.
Thanks for pointing out the need for this benchmark, though!

Elliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170305/fc9d468a/attachment-0001.html>


More information about the Python-ideas mailing list