list.pop(0) vs. collections.dequeue
Alf P. Steinbach
alfps at start.no
Sat Jan 23 08:43:28 CET 2010
* Steve Howell:
> On Jan 22, 7:09 pm, Roy Smith <r... at panix.com> wrote:
>> In article
>> <3ac173bd-4124-434d-b726-0b9baaeec... at 36g2000yqu.googlegroups.com>,
>> Steve Howell <showel... at yahoo.com> wrote:
>>> In my case I'm not really concerned about giving the memory back right
>>> away, it's more about keeping my code simple. Once I'm done with an
>>> element, I do just want to pop it off and keep all the simplicity of
>>> the lists, but I am just concerned enough speed that for a 1000
>>> element list, I don't want to be doing 1000 memmoves for an average of
>>> 500 pointers, which effectively moves about a million bytes around for
>>> no reason.
>> If you really want to pop all the elements from a long list, reverse the
>> list and pop them off the end. Popping every element off the beginning of
>> the list is O(n^2), as you pointed out. Reversing the list is O(n), and
>> each pop after that is O(1), so the overall complexity is O(n).
> I really want to use list *normally* with all its perfectly good
> semantics and reasonable implementation, except for its blind spot
> with respect to popping the first element off the list. The whole
> reason I use CPython vs. C in the first place is that CPython
> programmers can generally program basic data structures better than I
> can. But list.pop(0) is the exception. And, with the possible
> exception of dicts, lists are the most fundamental data structures
> that Python has.
Having optimized list.pop() for first element, then you would have a blind spot
with respect to insertion at the front... Which could then be optimized for the
cases where there is some buffer space at the front, left over from a pop. I
think it would just be harder to understand and harder to explain. And I think
that due to that, as usual people would build their own elaborate
"explanations", with erroneous conclusions, and would then use it in inefficient
ways (like, popping from the middle or inserting at the front).
I think the "harder to use correctly" is the strongest argument against the
optimization, but there is also a non-obvious *memory overhead*...
Having popped just one element at the front, in order for the implementation to
not /accumulate/ unused memory, that is, in order to avoid an ongoing /memory
leak/, extending the buffer to accomodate e.g. an append() can no longer be done
as a (on relevant systems) constant time reallocation (letting the OS do its
virtual memory paging magic). Instead, a new buffer would have to be allocated
and all data copied over. One could still have amortized constant time for
appends by using a doubling buffer (which is the strategy used in C++
'std::vector'), but at the cost of wasting some memory, a percentage...
The implementation as a pure array is easy to understand and use correctly, and
doesn't waste memory.
But it would IMHO have been better if it wasn't called "list", which brings in
the wrong associations for someone used to other languages. The name does
matter. E.g. I don't think this discussion about a pop optimization would have
been there except for the name, which makes that optimization sound reasonable.
Perhaps some standard alternative name could be devised. Like, "array" would
have been nice, except that that's already grabbed by the array module.
> I know Python's number one concern will never be speed, but if Python
> makes an O(1) operation into an unnecessarily O(N) operation for no
> good reasons other than "it's too complicated, " or it "adds another
> pointer to the structure," or "it adds another conditional check to
> list_ass_slice for operations that aren't popping off the top," I
> think it's reasonable to challenge the design philosophy.
See above. In addition to being more difficult /to use/ correctly, that is,
being much easier to misunderstand, it incurs a memory overhead -- not the one
directly from the pop optimization, but by the requirements of buffer extension.
Essentially, as discussed above, it would then have to use a doubling buffer.
Cheers & hth.,
More information about the Python-list