alwayssortedlist (was Re: On PEP 322 (ireverse))
Tim Peters
tim.one at comcast.net
Tue Nov 4 03:11:34 CET 2003
[timbot]
>> bisect.insort() will eventually beat the pants off list.sort(),
>> and the more expensive element comparisons are, the shorter the
>> list at which insort first wins.
[martellibot]
> Yes, and even with a list of integers insort gets on top pretty fast,
> e.g., even just with 100 items or so:
>
> [alex at lancelot bo]$ timeit.py -c -s'import bisect' -s'LL=range(0,
> 333, 3)' 'L=LL[:]; L.append(33); L.sort()'
> 10000 loops, best of 3: 17 usec per loop
>
> [alex at lancelot bo]$ timeit.py -c -s'import bisect' -s'LL=range(0,
> 333, 3)' 'L=LL[:]; bisect.insort(L, 33)'
> 100000 loops, best of 3: 12.4 usec per loop
>
> thanks for the clear explanation (and for upholding bots' honor...).
bots helping bots -- it's a way of life.
IIRC, you weren't originally advocating using .sort() after each append, but
rather bunching them up until an access operation. That's more interesting,
and the tradeoffs can change.
Supposing we have a sorted list of N elements, and append K random elements,
where K is much smaller than N:
bisect.insort() takes about log2(N) comparisons for each, and we shuffle
about half the elements down a slot each time. So rough orders:
compares: K * log2 N
moves: K * N
OTOH, list.sort() will discover that the long prefix is already sorted after
about N compares, will sort the small tail with about K log2 K compares
(worst case) and moves, and merge both sequences into place with about N
total moves. The number of compares used by the merge is hard to guess,
because "galloping mode" kicks in and is likely to be very effective when
the slices to merge have very different sizes. 2 * K * log2 N isn't a bad
guess.
compares: N + K * log2 K + 2 * K * log2 N
moves: N + K * log2 K
So when K is 1 the # of moves is about the same either way, but insort uses
far fewer compares. As K gets larger, insort can be expected to start doing
a hell of a lot more moves, though; despite that compares are more expensive
than moves, do enough of them and the balance changes.
For example, on my home box, if I start with the list range(1000000), and
add range(-20, 0) to it, inserting those 20 new guys one at a time via
insort() takes more than twice as long as extending the list with the new
guys once and letting sort() do it all.
Boost that to range(-50, 0), and sort() time is virtually unchanged (note
that plain N is the dominant term for both compares and moves for sort()
above, so boosting K *shouldn't* matter much while K is much smaller than
N), but insort() time more than doubles (the K*N #-of-moves dominates, and
goes up linearly with K despite that K is small).
The moral of the story is that if you need the list to be in sorted order
always, use a BTree <wink>.
More information about the Python-list
mailing list