Port Powersort Improvement of Timsort

Hi all, I'm working with the University of Liverpool to implement Munro & Wild's Powersort in various libraries. Powersort has been included in CPython 3.11 and PyPy as a replacement for Timsort (https://github.com/python/cpython/issues/78742), therefore porting Powersort to numpy is a logical next step. Compared to Timsort's original merge policy, Powersort finds a near-optimal order of merges without measurable overhead and so seems like a no-detriment improvement. (See also benchmarks referenced in the GitHub issue). I wonder if you'd be receptive to a PR bringing the same change to numpy. I do have an existing implementation in C++ to work from (https://github.com/sebawild/powersort), in addition to the C implementations in CPython and PyPy and am willing to invest time into porting Powersort for numpy. Being a new contributor to this project, any pointers to relevant processes, documentation, and source files I should bear in mind would greatly appreciated. Kind regards, Ben

Hi Ben, Thanks for your proposal! On Wed, Jun 14, 2023, at 03:24, Ben Weston wrote:
I wonder if you'd be receptive to a PR bringing the same change to numpy. I do have an existing implementation in C++ to work from (https://github.com/sebawild/powersort), in addition to the C implementations in CPython and PyPy and am willing to invest time into porting Powersort for numpy.
Powersort looks interesting. I see in our docs: ``` kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional Sorting algorithm. The default is 'quicksort'. Note that both 'stable' and 'mergesort' use timsort or radix sort under the covers and, in general, the actual implementation will vary with data type. The 'mergesort' option is retained for backwards compatibility. ``` Given that timsort is used "underneath the hood", it may be an option to improve it, especially given that it does not change fundamental aspects such as stability. You can take a look at src/npysort/timsort.cpp to see what we currently have (a whole bunch of code for handling various object types). Best regards, Stéfan
participants (2)
-
Ben Weston
-
Stefan van der Walt