Performance of lists vs. list comprehensions

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Tue Jan 19 21:52:16 EST 2010


On Tue, 19 Jan 2010 16:20:42 -0500, Gerald Britton wrote:

> That's surprising. I wouldn't implement it that way at all.  I'd use a
> dynamically-expanding buffer as I suggested.  That way you get a single
> pass and don't have to calculate anything before you begin.  In the best
> case, you'd use half the memory (the result just fits in the buffer
> after its last expansion and no saved tuple).  In the worst case, the
> memory use is about the same (you just expanded the buffer using a 2x
> expansion rule then you hit the last item).

In the worst case, you waste 50% of the memory allocated. And because 
strings are immutable (unlike lists and dicts, which also use this 
approach), you can never use that memory until the string is garbage 
collected.

In the current approach, join produces a temporary sequence, but it 
doesn't last very long. With your suggested approach, you could end up 
with a large number of long-lasting strings up to twice the size 
necessary. Since join is particularly useful for building large strings, 
this could be a significant memory pessimation.

The obvious fix is for join to shrink the buffer once it has finished 
building the string, but the success of that may be operating system 
dependent. I don't know -- it sounds like a recipe for memory 
fragmentation to me. And even if it can be done, reclaiming the memory 
will take time, potentially more time than the current approach.

Still, it's probably worth trying, and seeing if it speeds join up.


-- 
Steven



More information about the Python-list mailing list