Delete all items in the list

Terry Reedy tjreedy at udel.edu
Sat Feb 28 12:46:13 EST 2009


Steven D'Aprano wrote:

> Big Oh notation is good for estimating asymptotic behaviour, which means it
> is good for predicting how an algorithm will scale as the size of the input
> increases. It is useless for predicting how fast that algorithm will run,
> since the actual speed depends on those constant factors that Big Oh
> neglects. That's not a criticism of Big Oh -- predicting execution speed is
> not what it is for.
> 
> For what it's worth, for very small lists, the while...remove algorithm is
> actually faster that using a list comprehension, despite being O(M*N)
> versus O(N), at least according to my tests. It's *trivially* faster, but
> if you're interested in (premature) micro-optimisation, you might save one
> or two microseconds by using a while loop for short lists (say, around
> N<=12 or so according to my tests), and swapping to a list comp only for
> larger input.
> 
> Now, this sounds silly, and in fact it is silly for the specific problem
> we're discussing, but as a general technique it is very powerful. For
> instance, until Python 2.3, list.sort() used a low-overhead but O(N**2)
> insertion sort for small lists, only kicking off a high-overhead but O(N)

O(NlogN)

> sample sort above a certain size. The current timsort does something
> similar, only faster and more complicated. If I understand correctly, it
> uses insertion sort to make up short runs of increasing or decreasing
> values, and then efficiently merges them.

It uses binary insertion sort to make runs of 64 (determined by 
empirical testing).  The 'binary' part means that it uses O(logN) binary 
search rather than O(n) linear search to find the insertion point for 
each of N items, so that finding insertion points uses only O(NlogN) 
comparisions (which are relatively expensive).  Each of the N insertions 
is done with a single memmove() call, which typically is relatively 
fast.  So although binary insertion sort is still O(N*N), the 
coefficient of the N*N term in the time formula is relatively small.

The other advantages of timsort are that it exploits existing order in a 
list while preserving the order of items that compare equal.

Terry Jan Reedy





More information about the Python-list mailing list