list.sort, was Re: [Python-Dev] decorate-sort-undecorate

Sat Nov 15 03:26:52 EST 2003

[Armin Rigo]
>> from heapq import *
>> def isorted(iterable):
>>     heap = list(iterable)
>>     heapify(heap)
>>     while heap:
>>         yield heappop(heap)
>>
>> This generator is similar to the new list.sorted() but starts
>> yielding elements after only O(n) operations (in heapify).
>> ...

[Raymond Hettinger]
> How much of the iterator can be consumed before it becomes preferable
> (in terms of speed and memory) to have used iter(list.sort())?
>
> My guess is that the break-even point for speed is around 10%
> depending on how much order already exists in the underlying list.

This depends so much on the speed of the heap implementation.  When it gets
into the log-time part, a high multiplicative constant due to fixed
overheads makes a slow heap run like a fast heap would if the latter were
working on an *exponentially* larger list.

I just tried on my laptop, under 2.3.2, with lists of a million random
floats.  That's a bad case for list.sort() (there's no order to exploit, and
it wastes some compares trying to find order to exploit), and is an average
case for a heapsort.  Even if I only asked for *just* the first element of
the sorted result, using sort() and peeling off the first element was about
25% faster than using heapify followed by one heappop.

That says something about how dramatic the overheads are in calling
Python-coded heap functions (well, it also says something about the amount
of effort I put into optimizing list.sort() <wink>).

There are deeper problems the heap approach has to fight:

1. A heapsort does substantially more element compares than a
   mergesort, and element compares are expensive in Python, so
   that's hard to overcome.

2. Heapsort has terrible spatial locality, and blowing the cache
   becomes even more important than comparison speed as the number
   of elements grows large.

   One of the experiments I did when writing the 2.3 sort was to
   compare a straight mergesort to an enhanced version of "weak-
   heap sort".  Both of those do close to the theoretical minimum
   number of compares on random data.  Despite that the mergesort
   moved more memory around, the always-sequential data access in
   the mergesort left it much faster than the cache-hostile weak-
   heap sort.  A regular heapsort isn't as cache-hostile as a
   weak-heap sort, but it's solidly on the cache-hostile side of
   sorting algorithms, and does more compares too.

There's another way to get an iterative sort:  do an ordinary recursive
top-down mergesort, but instead of shuffling sublists in place, *generate*
the merge of the subsequences (which are themselves generators, etc).
That's a very elegant sort, with the remarkable property that the first
element of the final result is generated after doing exactly N-1 compares,
which achieves the theoretical minimum for finding the smallest element.
Getting result elements after that takes O(log(N)) additional compares each.
No array storage is needed beyond the original input list (which isn't
changed), but there are O(N) generators hiding in the runtime stack.  Alas,
for that reason it's impractical for large lists, and the overheads are
deadly for short lists.  It does enjoy the advantage of beauty <wink>.

> In terms of memory, I think list.sort() always beats the above
> implementation.

That can't be -- the heap method only requires a fixed (independent of N)
and small amount of working storage.  list.sort() may need to allocate O(N)
additional temp bytes under the covers (to create a working area for doing
merges; it can be expected to allocate 2*N temp bytes for a random array of
len N, which is its worst case; if there's a lot of pre-existing order in
the input array, it can sometimes get away without allocating any temp
space).