[Python-Dev] collections module
Tim Peters
tim.one at comcast.net
Sat Jan 10 15:05:26 EST 2004
[Martin v. Loewis]
>> Shouldn't you malloc() in the case of prepending, and do the copying
>> yourself? r ealloc can only avoid copying if it can append a free
>> block. For the list, we know
>>
>> a) realloc is unlikely to find a new block, anyway, as it is pymalloc,
>> which never extends a small block.
[Guido]
> Not for large lists. pymalloc passes large objects off to real
> malloc.
pymalloc is irrelevant here: it's not used for list guts, just for list
headers. This is an efficiency hack, to speed the append-at-the-end use
case: as Martin says, pymalloc cannot extend in-place, so not using
pymalloc saves the cycles pymalloc would have consumed copying small list
guts repeatedly while the list was growing, and, after the list got large,
re-deducing over and over that it has to redirect to the system realloc.
This expense is exacerbated because Python doesn't keep track of how much
it's overallocated, it *always* calls realloc() on an append, (but "almost
always" calls realloc() with the same size it called realloc() with the last
time).
>> b) even if realloc would be able to extend the block, we still would
>> have to copy the data.
> For really large lists, you'd risk running out of memory due to the
> malloc(1GB), while realloc(1GB) very likely just extends (since such a
> large block effectively becomes in a memory segment of its own).
Right, that's the idea.
>> I don't know whether the overallocation would make similar-sized
>> room at both ends, or whether it would take the current operation
>> into account. For the specific case of queues, we may find that
>> doing *just* the copying might be sufficient, with no need to go
>> to the allocator.
For example, in this steady-state queue:
q = [x]
while True:
q.append(x)
q.pop(0)
q is never empty then, but never contains more than 1 element.
> Hm. An idea: don't bother overallocating when inserting in the front,
> but do keep the free space in the front if possible.
I'm not clear on what that means, and the devil's in the details.
> Then recommend that queues are implemented using append(x) and
> pop(0), rather than using insert(0, x) and pop().
Given the way realloc() works, I expect that add-at-the-right will always be
favored. If someone does "insert at the left", though, and we're out of
room, I'd give "the extra space" to the left end (we have to do a memmove in
that case regardless, and shifting stuff right N slots doesn't cost more
than shifting stuff right 1 slot). There are many possible strategies, and
it's worth some pain to make dequeue use of lists reasonably zippy too.
More information about the Python-Dev
mailing list