[Python-Dev] collections module

Kurt B. Kaiser kbk at shore.net
Sat Jan 10 18:21:06 EST 2004


"Tim Peters" <tim.one at comcast.net> writes:

> [Kurt B. Kaiser]
>> Aren't array-based implementations of queue usually based on
>> circular buffers?
>
> Yes, but those are generally bounded queues.

And Pythonic versions would not be :-)

>> Head and tail pointers plus a count are sufficient.  Then a push is:
>> check count (resize if necessary), write, adjust head pointer, adjust
>> count.
>
> By analogy to a real-life queue (of, say, people), pushing is usually viewed
> as adding to the tail (when you get in line for movie tickets, you're likely
> to get punched if you add yourself to the head of the queue).  I have to
> insist on using that terminology, just because it ties my head in knots to
> swap the usual meanings.

Reading other posts in the thread and further reflection had already
converted me to that point of view.  We are agreed!  Put to the tail and
get from the head.  Also, agreed that ob_item _is_ the headpointer.

>>  A pull is similar, using the tail pointer.  Constant time.
>
> So is the speculative Python list gimmick we're talking about, until it runs
> out of room.  We use mildly exponential overallocation when growing a list
> to ensure amortized O(1) behavior regardless.

The whole game lies in what happens when you resize: how much data are
you going to copy?

>> The resize is a little tricky because the gap between the tail and
>> head pointers is what needs to grow.
>
> Resizing is easier with the speculative Python list gimmick, and 0-based
> indexing (where list[0] is the current head) is identical to the
> list-indexing implementation Python already has.  The latter is important
> because it's fast.  Needing to "wrap around" an index to account for
> circularity would introduce new test+branch expense into indexing.

If I understand the gimmick, as the queue is used the
ob_item/headpointer will walk down the block and items will be
appended at the tail until the tail collides with the end of the
block.  At that point the gimmick approach will memmove the queue to
place ob_item at the beginning of the block.  Since the block is
only a little longer than the queue, that's quite a bit of copying,
unless I'm missing something.

With a circular buffer approach, the headpointer and tailpointer will
walk down the block and when the end is reached the tailpointer will
wrap around and "appended" items will be stored at the low end of the
block.  The process continues without any copying.

The position of the pointers is calculated modulo the blocksize, so no
test/branch is involved.  If there is some space at the beginning of
the block which has been freed up by pop(0), the list will
automatically wrap around to fill it, even when the pop(0) has been
done in a non-queue context.

With a circular buffer, upon resize the gap between tail and head is
widened by the amount added to the block.  With a queue, that would on
average amount to copying half the data, but with a known stride.

One could take a conservative approach and rotate the buffer every
resize so that ob_item is at the beginning of the block, but doing
that seems to be unnecessary.

The question of which way to rotate the buffer into the new free space
seems to be irrelevant in the case of a queue/deque, but may be
interesting for ordinary lists.  I suspect pushing the tail to the
left is optimum to minimize copying.

[...]

>> Why not have listobject grow deque functionality using circular buffer
>> techniques?
>
> As above, I don't see much to recommend that over the speculative Python
> list gimmick:  indexing is slower and more complicated, 

pointer arithmetic is cheap compared to memory access.

> resizing is more complicated, 

I'm not sure that's the case.  And the gimmick approach has a
'complication' when the tail hits the end of the block even when the
queue lenth is constant.

> and all the listobject.c internals would get more complicated by
> needing to deal with that the list contents are no longer
> necessarily in a contiguous slice of a C vector (from the C POV, the
> list contents could be in two disjoint contiguous slices).  

Not if modulo pointer arithmetic is used.

> It would have the advantage that resizing is never needed until
> you're wholly out of room, but that seems minor compared to the
> drawbacks.

-- 
KBK



More information about the Python-Dev mailing list