[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Nick Timkovich prometheus235 at gmail.com
Sun Mar 5 22:28:56 EST 2017


I see the benchmarks, and while I assume the asymptotic complexity is the
same, is there a longer "start-up" time your optimizations need? Do you
have benchmarks where you sort 10, 100...10**6 items to show that beyond
the types you're sorting, you're not amortizing any increased overhead out
to oblivion?

On Sun, Mar 5, 2017 at 9:24 PM, Elliot Gorokhovsky <
elliot.gorokhovsky at gmail.com> wrote:

> On Sun, Mar 5, 2017 at 8:20 PM David Mertz <mertz at gnosis.cx> wrote:
>
>>
>> B. I think a very large percentage of lists are heterogeneous.  But most
>> of the time when they are, it's not because there are several different
>> numeric types but rather because a list collects some sort of custom
>> objects.  Maybe those even all share some ancestor, but the point is they
>> are user/library defined classes that won't have a fast comparison like
>> ints or floats do.
>>
>
> As long as the ob_type for all the objects is the same, my patch will sort
> significantly faster, as it replaces PyObject_RichCompareBool with
> ob_type->tp_richcompare. It doesn't matter if your list is builtin-types or
> not, as long as the types are all the same, you get a speedup (though it's
> greater for built-in types).
>
> Also, I actually wrote a crawler to see how many PyPI packages implement a
> custom compare function (like you would have to for user-defined types).
> The answer is less than 6%. Code: https://github.com/embg/
> python-fast-listsort/tree/master/crawler
>
>
>> B (part 2): I haven't actually read his patch, but I'm assuming that
>> Elliot's approach can start scanning the list, and as soon as it find a
>> complex/custom object at index 0 ignore the rest of the scan.  So for that
>> case, there is very little harm in his linear scan (over one item).
>>
>
> Yup, the pre-sort check breaks out of the loop as soon as it sees
> heterogeneity. So unless your float is at the end of the whole list of ints
> (as in my worst-case benchmark), the cost is basically 0.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170305/901a688f/attachment.html>


More information about the Python-ideas mailing list