list.pop(0) vs. collections.dequeue

Sat Jan 23 04:38:55 EST 2010

* Steve Howell:
> On Jan 23, 12:32 am, "Alf P. Steinbach" <al... at start.no> wrote:
>> * Steve Howell:
>>
>>> On Jan 23, 12:13 am, Terry Reedy <tjre... at udel.edu> wrote:
>>>> Challenge yes, mock no.
>>>> Part of writing good basic data structures is not adding needless
>>>> complication from featuritis and not penalizing 99.99% of access to
>>>> satify a .01% need better satisfied another way.
>>> I would like to challenge your assertion that advancing ob_item
>>> instead of doing memmove during list_ass_slice would impact the
>>> performance of list accesses in any way.  It would only slow down
>>> operations that add/insert items into the list by, and then only by a
>>> single conditional statement, and those add/insert operations are
>>> already O(N) to begin with.
>> I'm sorry, no, the last part is incorrect.
>>
>> Appending to a 'list' can currently be constant time, if OS reallocation is
>> constant time (as the string '+' optimization relies on).
>>
> 
> That's true, but it's also irrelevant, as the pop optimization would
> happen in a branch of code that never gets executed during list
> appending:
> 
> 	if (d < 0) { /* Delete -d items */
>                 /* ADD CODE HERE TO AVOID memmove
>                    when ilow == 0 (i.e. ihigh + d == 0)
>                 */
> 		memmove(&item[ihigh+d], &item[ihigh],
> 			(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
> 		list_resize(a, Py_SIZE(a) + d);
> 		item = a->ob_item;
> 	}
> 
> 
>> With the pop optimization it can no longer be constant time without risking an
>> accumulation of unused memory, a memory leak, although it can be amortized
>> constant time, at the cost of wasting some percentage of memory.
> 
> list_resize already overallocates memory to allow room for growth.
> Whenever you did an append to the list that would force a realloc, you
> could first check to see if there is unused stuff at the front and do
> the memmove then and reclaim the unfreed memory.  So instead of doing
> a paying for memmove on every pop [at front], you only pay for it when
> the list goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

Oh. If 'list' already uses a doubling buffer then the only overhead from the 
optimization would, AFAICS, be a single add in every indexing. Which might be 
acceptable (at least it sounds very reasonable in the context of Python).

Re terminology: I write "doubling buffer" to mean increase of buffer size by a 
factor. It's often 2, but might be e.g. 1.5, whatever. The point of using a 
constant factor is to avoid quadratic time for loops doing appending, i.e. the 
constant factor size increase yields amortized constant time per append.

The sizes that you quote above, on the other hand, look like rather arbitrary 
buffer size increases where the size to increase by is limited to a certain 
small range. With copying or moving of everything at each buffer size increase 
that would not yield amortized constant time. It yield linear time, and 
quadratic time for a loop doing appends.

But, anyway, if 'list' already uses a doubling buffer then the only overhead 
from the pop optimization would, AFAICS, be a single add in every indexing.

On the third & gripping hand, however, a proof-of-concept actual implementation 
(patch) would be needed to ensure that one doesn't overlook any showstopper or 
serious problem, and to provide timings. And the special case would have to be 
documented as a special case. Hm, it would be nice if the Python docs offered 
complexity (time) guarantees in general...

Cheers,

- Alf