[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Thu Mar 9 05:04:44 EST 2017

> On 8 Mar 2017, at 22:58, Elliot Gorokhovsky <elliot.gorokhovsky at gmail.com> wrote:
> 
> On Wed, Mar 8, 2017 at 2:14 PM Barry <barry at barrys-emacs.org <mailto:barry at barrys-emacs.org>> wrote:
> Can you assume that list of of type(list[0]) and use that type's optimised sort?
> But in the optimised sort code check that the types are as required.
> If you hit an element that is not of the required type then fall back to the unoptimised sort.
> 
> Well, how would you tell if you've hit an element that is not of the required type? You'd have to check during every compare, right? And that's precisely what we're trying to avoid!

What it seemed the trick for optimisation is is to compare the type pointer of an object to see if its the same as a type supported by the chosen optimised sort.

It was not clear to me that you need to scan the list at the start to make sure its homogeneous. Given that the type check is so cheap will it
slow the sort if you do the pointer check in the compare code? I am not suggesting you run rich compare full fat on each compare.

> The whole point of my patch is that we do O(nlogn) compares, but only have O(n) elements, so it's much cheaper to do all the type checks in advance, and in the very likely case that our list is homogeneous, switch to an optimized special-case compare function.

So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which is smaller the O(n) pointer checks?

> 
> Even when we only do O(n) compares, my patch is still much faster (see benchmarks higher up in this thread). Why? Well, if you're doing the type checks during the compares, you're doing them across different function calls, with other stuff interspersed in between. So pipeline/branch prediction/caching is less able to optimize away the overhead of the safety checks (I don't know how CPUs work, but one of those is relevant here). With the pre-sort check, it's all in a single iteration through the list, and we're taking the same exact branch every time; it's much faster. Think iterating over a matrix row-wise versus iterating column-wise (not exactly relevant here since that's about cache, but that's the idea. Again, I don't really know how CPUs work).

Provided the code is small I think both versions of the algorithm will benefit from cache and branch prediction.

> 
> So in short: no, we can't just check as we go, because that's what we're already doing. But we can optimize significantly by checking in advance. I mean, practically speaking, the advance check is basically free. The compare-time checks, in sum, are not, both asymptotically and practically.

I not clear this is a true. But I have not read the code.

Barry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170309/6e5f9073/attachment.html>