[Python-Dev] timsort for jython

Mon, 05 Aug 2002 12:06:28 -0400

[/F]
> sounds like yet another reason to add two methods; one that
> guarantees stability, and one that doesn't.

I haven't heard the first reason, only people latching on to that a
distinction *can* be drawn, "so therefore it must be" (or something like
that ...).  The only portable way you can get stability is to do the DSU
business anyway.

> the only counter-argument I've seen from you is code bloat, but
> I cannot see what stops us from mapping *both* methods to a
> single implementation in CPython 2.3.

I passed that suggestion on in the patch report, when I asked Guido to
Pronounce, and he didn't want that.  Perl 5.8 "has a sort pragma for limited
control of the sort ... [which] may not persist into future perls",
according to <http://www.perldoc.com/perl5.7.3/pod/func/sort.html>.  Maybe
we should do that too <wink>.

> an alternative would be to add a sortlib module:
>
>     $ more Lib/sortlib.py
>
>     def stablesort(list):
>         list.sort() # 2.3's timsort is stable!
>
> and a regression test script that makes sure that it really is stable
> (can a test program ever be sure?)

I've suggested before that you may very well want to use DSU indices even if
you *know* the underlying sort is stable, in order to prevent massive
increase in sort time due to equal keys falling back to comparing records
(some sorts from Kevin Altis's database showed that dramatically).  So the
use cases for relying on stability *in Python* aren't all that clear:
passing an explicit comparison function is way slower, but sorting (key,
record) tuples instead is also prone to major slowdown surprises.  Sorting
(key, index, record) tuples remains your safest bet (unless you don't care
about speed).

So I'd like to see some real use cases.  An appropriate design for a sortlib
module may (or may not) suggest itself then.

BTW, list.sort() is stable in CPython iff

    [].sort.__doc__.find('stable')

is true.  Short of that, the stability test in Lib/test/test_sort.py will
almost certainly determine whether it's stable (not 100% certain, but
99.999999% easy <wink>).