Deleting the first element of a list

Alex Martelli aleax at aleax.it
Thu Oct 3 02:23:16 EDT 2002


Peter Hansen wrote:

> Bruce Dawson wrote:
>> Then it should be much better to go:
>> 
>> myList.reverse()
>> while myList:
>>     # Do Something with the last element
>>     del myList[len(myList)-1]

Incidentally, "del myList[-1]" is a more readable way to express this.

>> This changes it from O(n^2) to O(n), which could make it run thousands
>> of times faster.
> 
> You're making assumptions, possibly good ones, about the implementation
> of the list object in Python.  It's possible, though obviously unlikely,
> that the list is reallocated and copied every time it shrinks by even
> one element.  (I'm just pointing this out for discussion purposes...
> no doubt it doesn't work that way.)  That would leave it at O(n^2).

Python does not specify the O()-behavior of its built-in type's operations
(the way C++'s standard library does), but I think I have the next best
thing -- Tim Peters' review of my forthcoming Nutshell book's chapter on
optimization, which does devote some time to documenting that behavior.
For x>0, "del somelist[x]" is O(len(somelist)-x) -- you can count on
that as well as you can count on most things in this sublunar orb.

IOW, I don't think these "discussion purposes" serve any concrete
purpose in this context.

> Unless you profiled this already under a particular implementation of
> Python and are sure it really is a question of O(n) vs. O(n^2)...

"Only a madman can be absolutely sure", but that applies to just
about anything.  That doesn't and shouldn't stop us from acting
on the basis of "reasonable certainty".


> Likewise, even with the original question, it's possible that for
> del list[0] vs. list = list[1:] the overhead of allocating a new
> list and deallocating the old one is negligible compared to the
> cost of copying n-1 references around in memory.  In that case there
> would be only minor performance differences between them for
> suitable values of n.

Yes, allocation costs cannot be guaranteed -- THAT is a point worth
making.  Similarly, no guarantees can be offered regarding such
things as caching effects, VM thrashing, and other by-products of
the existence of memory hierarchies (maybe that's why Cray boxes
tended to avoid such hierarchies...).

"del alist[0]" must copy N-1 references; "alist = alist[1]" must
perform the same copies plus one allocation and one freeing.  So,
the former's performance "can't" (with reasonable certainty) be
worse than the latter's, but you can't predict how much better --
surely not across platforms.  Still, if you must choose between
the two ways of expressing functionality that is equivalent to
your application, the former is clearly better.

As between "del alist[0]" and "del alist[-1]", the case is even
sharper: the latter's O(1) [at worst in an amortized sense,
though I don't think you need this proviso in any current or
planned Python implementation], the former's O(N) [ditto].

Of course, if any user-coded __del__ is triggered by any of
these (if it is by one, it should be by all), then all bets are
off -0- there are no limits to the amount of computation that
such a __del__ could perform, and thus to its ill effects on
a program's overall performance.


Alex




More information about the Python-list mailing list