[Python-Dev] Sorting

Tim Peters tim.one@comcast.net
Sun, 21 Jul 2002 01:19:03 -0400


[Aahz]
> Any reason the list object can't grow a .stablesort() method?

I'm not sure.  Python's samplesort implementation is right up there among
the most complicated (by any measure) algorithms in the code base, and the
mergesort isn't any simpler anymore.  Yet another large mass of difficult
code can make for a real maintenance burden after I'm dead.  Here, guess
what this does:

static int
gallop_left(PyObject *pivot, PyObject** p, int n, PyObject *compare)
{
	int k;
	int lo, hi;
	PyObject **pend;

	assert(pivot && p && n);
	pend = p+(n-1);
	lo = 0;
	hi = -1;
	for (;;) {
		IFLT(*(pend - lo), pivot)
			break;
		hi = lo;
		lo = (lo << 1) + 1;
		if (lo >= n) {
			lo = n;
			break;
		}
	}
	lo = n - lo;
	hi = n-1 - hi;
	while (lo < hi) {
		int m = (lo + hi) >> 1;
		IFLT(p[m], pivot)
			lo = m+1;
		else
			hi = m;
	}
	return lo;
fail:
	return -1;
}

There are 12 other functions that go into this, some less obscure, some
more.  Change "hi = -1" to "hi = 0" and you'll get a core dump, etc; it's
exceedingly delicate, and because truly understanding it essentially
requires doing a formal correctness proof, it's difficult to maintain; fight
your way to that understanding, and you'll know why it sorts, but still
won't have a clue about why it's so fast.  I'm disinclined to add more code
of this nature unless I can use it to replace code at least as difficult
(which samplesort is).

An irony is that stable sorts are, by definition, pointless unless you *do*
have equal elements, and the many-equal-elements case is the one known case
where the new algorithm is much slower than the current one (indeed, I have
good reason to suspect it's the only such case, and reasons beyond just that
God loves a good joke <wink>).

It's OK by me if this were to become Python's only sort.  Short of that, I'd
be happier contributing the code to a sorting extension module.  There are
other reasons the latter may be a good idea; e.g., if you know you're
sorting C longs, it's not particularly difficult to do that 10x faster than
Python's generic list.sort() can do it; ditto if you know you're comparing
strings; etc.  Exposing the binary insertion sort (which both samplesort and
mergesort use) would also be useful to some people (it's a richer variant of
bisect.insort_right).  I'd prefer that Python-the-language have just one
"really good general sort" built in.