[Python-Dev] Sorting
Tim Peters
tim.one@comcast.net
Sat, 20 Jul 2002 02:18:16 -0400
An enormous amount of research has been done on sorting since the last time
I wrote a sort for Python. Major developments have been in two areas:
1. Adaptive sorting. Sorting algorithms are usually tested on random
data, but in real life you almost never see random data. Python's
sort tries to catch some common cases of near-order via special-
casing. The literature has since defined more than 15 formal
measures of disorder, and developed algorithms provably optimal in
the face of one or more of them. But this is O() optimality, and
theoreticians aren't much concerned about how big the constant
factor is. Some researchers are up front about this, and toward
the end of one paper with "practical" in its title, the author was
overjoyed to report that an implementation was only twice as slow
as a naive quicksort <wink>.
2. Pushing the worst-case number of comparisons closer to the
information-theoretic limit (ceiling(log2(N!))).
I don't care much about #2 -- in experiments conducted when it was new, I
measured the # of comparisons our samplesort hybrid did on random inputs,
and it was never more than 2% over the theoretical lower bound, and
typically closer. As N grows large, the expected case provably converges to
the theoretical lower bound. There remains a vanishly small chance for a
bad case, but nobody has reported one, and at the time I gave up trying to
construct one.
Back on Earth, among Python users the most frequent complaint I've heard is
that list.sort() isn't stable. Alex is always quick to trot out the
appropriate DSU (Decorate Sort Undecorate) pattern then, but the extra
memory burden for that can be major (a new 2-tuple per list element costs
about 32 bytes, then 4 more bytes for a pointer to it in a list, and 12 more
bytes that don't go away to hold each non-small index).
After reading all those papers, I couldn't resist taking a crack at a new
algorithm that might be practical, and have something you might call a
non-recursive adaptive stable natural mergesort / binary insertion sort
hybrid. In playing with it so far, it has two bad aspects compared to our
samplesort hybrid:
+ It may require temp memory, up to 2*N bytes worst case (one pointer each
for no more than half the array elements).
+ It gets *some* benefit for arrays with many equal elements, but not
nearly as much as I was able to hack samplesort to get. In effect,
paritioning is very good at moving equal elements close to each other
quickly, but merging leaves them spread across any number of runs.
This is especially irksome because we're sticking to Py_LT for
comparisons, so can't even detect a==b without comparing a and b twice
(and then it's a deduction from that not a < b and not b < a). Given
the relatively huge cost of comparisons, it's a timing disaster to do
that (compare twice) unless it falls out naturally. It was fairly
natural to do so in samplesort, but not at all in this sort.
It also has good aspects:
+ It's stable (items that compare equal retain their relative order, so,
e.g., if you sort first on zip code, and a second time on name, people
with the same name still appear in order of increasing zip code; this
is important in apps that, e.g., refine the results of queries based
on user input).
+ The code is much simpler than samplesort's (but I think I can
fix that <wink>).
+ It gets benefit out of more kinds of patterns, and without lumpy
special-casing (a natural mergesort has to identify ascending and
descending runs regardless, and then the algorithm builds on just
that).
+ Despite that I haven't micro-optimized it, in the random case it's
almost as fast as the samplesort hybrid. In fact, it might
have been a bit faster had I run tests yesterday (the samplesort
hybrid got sped up by 1-2% last night). This one surprised me the
most, because at the time I wrote the samplesort hybrid, I tried
several ways of coding mergesorts and couldn't make it as fast.
+ It has no bad cases (O(N log N) is worst case; N-1 compares is best).
Here are some typical timings, taken from Python's sortperf.py, over
identical lists of floats:
Key:
*sort: random data
\sort: descending data
/sort: ascending data
3sort: ascending data but with 3 random exchanges
~sort: many duplicates
=sort: all equal
!sort: worst case scenario
That last one was a worst case for the last quicksort Python had before it
grew the samplesort, and it was a very bad case for that. By sheer
coincidence, turns out it's an exceptionally good case for the experimental
sort:
samplesort
i 2**i *sort \sort /sort 3sort ~sort =sort !sort
15 32768 0.13 0.01 0.01 0.10 0.04 0.01 0.11
16 65536 0.24 0.02 0.02 0.23 0.08 0.02 0.24
17 131072 0.54 0.05 0.04 0.49 0.18 0.04 0.53
18 262144 1.18 0.09 0.09 1.08 0.37 0.09 1.16
19 524288 2.58 0.19 0.18 2.34 0.76 0.17 2.52
20 1048576 5.58 0.37 0.36 5.12 1.54 0.35 5.46
timsort
15 32768 0.16 0.01 0.02 0.05 0.14 0.01 0.02
16 65536 0.24 0.02 0.02 0.06 0.19 0.02 0.04
17 131072 0.55 0.04 0.04 0.13 0.42 0.04 0.09
18 262144 1.19 0.09 0.09 0.25 0.91 0.09 0.18
19 524288 2.60 0.18 0.18 0.46 1.97 0.18 0.37
20 1048576 5.61 0.37 0.35 1.00 4.26 0.35 0.74
If it weren't for the ~sort column, I'd seriously suggest replacing the
samplesort with this. 2*N extra bytes isn't as bad as it might sound, given
that, in the absence of massive object duplication, each list element
consumes at least 12 bytes (type pointer, refcount and value) + 4 bytes for
the list pointer. Add 'em all up and that's a 13% worst-case temp memory
overhead.