Re: [Python-ideas] [Python-Dev] Drastically improving list.sort() for lists of strings/ints

[redirected from python-dev, to python-ideas; please send followups only to python-ideas] [Elliot Gorokhovsky <elgo8537@colorado.edu>]
It will be fun to find out ;-) As Mark, and especially Terry, pointed out, a major feature of the current sort is that it can exploit many kinds of pre-existing order. As the paper you referenced says, "Realistic sorting problems are usually far from random." But, although they did run some tests against data with significant order, they didn't test against any algorithms _aiming_ at exploiting uniformity. Just against their radix sort variants, and against a quicksort. That's where it's likely next to impossible to guess in advance whether radix sort _will_ have a real advantage. All the kinds of order the current sort can exploit are far from obvious, because the mechanisms it employs are low-level & very general. For example, consider arrays created by this function, for even `n`: def bn(n): r = [None] * n r[0::2] = range(0, n//2) r[1::2] = range(n//2, n) return r Then, e.g.,
bn(16) [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15]
This effectively takes range(n), cuts it in half, and does a "perfect shuffle" on the two halves. You'll find nothing in the current code looking for this as a special case, but it nevertheless sorts such arrays in "close to" O(n) time, and despite that there's no natural run in the input longer than 2 elements. That said, I'd encourage you to write your code as a new list method at first, to make it easiest to run benchmarks. If that proves promising, then you can worry about how to make a single method auto-decide which algorithm to use. Also use the two-array version. It's easier to understand and to code, and stability is crucial now. The extra memory burden doesn't bother me - an array of C pointers consumes little memory compared to the memory consumed by the Python objects they point at. Most of all, have fun! :-)
participants (1)
-
Tim Peters