[Python-Dev] Optimizing list.sort() by checking type in advance

Elliot Gorokhovsky elliot.gorokhovsky at gmail.com
Mon Oct 10 02:18:35 EDT 2016


Hi,

I posted here a while back asking what you guys thought about implementing
radix sort for strings in list.sort(). You gave me a lot of reasons why
that would be a bad idea. However, it got me thinking, and I came up with
something that you may find interesting.

First, some simple benchmark results (numbers are seconds, check out the
extension module at https://github.com/embg/python-fast-listsort.git):

*** 1e3 ints ***
F.fastsort(): 0.00018930435180664062
F.sort(): 0.0002830028533935547
*** 1e3 strings ***
F.fastsort(): 0.0003533363342285156
F.sort(): 0.00044846534729003906
*** 1e7 ints ***
F.fastsort(): 5.479267358779907
F.sort(): 8.063318014144897
*** 1e7 strings ***
F.fastsort(): 9.992833137512207
F.sort(): 13.730914115905762

The optimization uses the fact that, in practice, we are almost always
sorting keys of the same type (note: not objects of the same type, *keys*
of the same type; we could have a complex key function like str that takes
many different types, but the keys are still all the same type). So we can
just do one typecheck up front and then do unsafe comparisons during the
sort (if our initial check passes). Specifically, we check that for all the
PyObject* in saved_ob_item, the ob->ob_type are the same. Then we replace
PyObject_RichCompare with ob_type->tp_richcompare. Additionally, we can
check for the very common cases of ints and strings and give optimized
comparison functions for those cases. It might not seem like this would
save a lot of time, but it turns out that PyObject_RichCompare is a massive
clusterf**k that has to test tons of type stuff before it can actually
compare. Cutting that out ends up saving a *lot* of time, as the benchmark
demonstrates.

What do you think? I'm planning on writing this up into a patch, but wanted
to get some feedback on the implementation and ideas for improvement first.

Thanks,
Elliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20161010/1feca58f/attachment.html>


More information about the Python-Dev mailing list