[Python-Dev] Drastically improving list.sort() for lists of strings/ints
Terry Reedy
tjreedy at udel.edu
Sun Sep 11 16:48:50 EDT 2016
On 9/11/2016 3:45 AM, Elliot Gorokhovsky wrote:
> Hello all,
>
> I am interested in making a non-trivial improvement to list.sort(),
This is non-trivial, and harder than you seem to think it is.
> but
> before I put in the work, I want to test the waters and see if this is
> something the community would accept.
The debate on proposed enhancements is usually whether they are really
enhancements, all things considered. For special-case speedups, the
'all things' include the frequency of the special cases, the ease of
detecting them, the thoroughness of testintg, and the maintainability of
the proposed, and likely more complicated, code.
> Basically, I want to implement radix sort for lists of strings.
Radix sort was invented for fixed-length strings of digits, as in
all-digit id 'numbers', so 10 bins. Ascii strings need 128 bins,
general byte strings need 256, still manageable. General unicode
strings require 1,114,112 bins, most of which will be empty for most
characters positions. This is harder to manage. So are variable-length
strings.
In CPython 3.3+, strings at the C level are not just strings but are 1,
2, or 4 bytes per char strings. So you could specifically target lists
of bytes (and bytearrays) and lists of strings limited to 1-byte
characters. The same code should pretty much work for both.
> ...
> In-place radix sort is significantly faster for lexicographic sorting than
> Timsort (or in general any comparison-based sort, since radix can beat
> the nlogn barrier).
This unqualified statement is doubly wrong.
First, with respect to sorting in general: 'aysmptotically faster' only
means faster for 'large enough' tasks. Many real world tasks are small,
and big tasks gets broken down into multiple small tasks.
'Asymptotically slower' algoritms may be faster for small tasks. Tim
Peters investigated and empirically determined that an O(n*n) binary
insort, as he optimized it on real machines, is faster than O(n*logn)
sorting for up to around 64 items. So timsort uses binary insertion to
sort up to 64 items. Similarly, you would have to investigate whether
there is a size below which timsort is faster.
Second, with respect to timsort in particular: timsort is designed to
exploit structure and run faster than O(n*logn) in special cases. If a
list is already sorted, timsort will do one O(n) scan and stop. Any
radix sort will take several times longer. If a list is reverse sorted,
timsort will do one O(n) scan and do an O(n) reverse. If a list is the
concatenation of two sorted lists, timsort will find the two sorted
sublists and merge them. If a sorted list has unsorted items appended
to the end, timsort will sort the appended items and then do a merge. I
expect any radix sort to be slower for all these cases. Tim Peters
somewhere documented his experiments and results with various special
but plausible cases of non-randomness.
What you might propose is this: if the initial up or down sequence
already determined by timsort is less than, say, 64 items, and the
length of the list is more than some empirically determined value,
and all items are bytes or byte-char strings and some further checks
determine the same, then switch to rsort. But you should start by
putting a module on PyPI.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list