Performance of lists vs. list comprehensions
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Tue Jan 19 21:52:16 EST 2010
On Tue, 19 Jan 2010 16:20:42 -0500, Gerald Britton wrote:
> That's surprising. I wouldn't implement it that way at all. I'd use a
> dynamically-expanding buffer as I suggested. That way you get a single
> pass and don't have to calculate anything before you begin. In the best
> case, you'd use half the memory (the result just fits in the buffer
> after its last expansion and no saved tuple). In the worst case, the
> memory use is about the same (you just expanded the buffer using a 2x
> expansion rule then you hit the last item).
In the worst case, you waste 50% of the memory allocated. And because
strings are immutable (unlike lists and dicts, which also use this
approach), you can never use that memory until the string is garbage
collected.
In the current approach, join produces a temporary sequence, but it
doesn't last very long. With your suggested approach, you could end up
with a large number of long-lasting strings up to twice the size
necessary. Since join is particularly useful for building large strings,
this could be a significant memory pessimation.
The obvious fix is for join to shrink the buffer once it has finished
building the string, but the success of that may be operating system
dependent. I don't know -- it sounds like a recipe for memory
fragmentation to me. And even if it can be done, reclaiming the memory
will take time, potentially more time than the current approach.
Still, it's probably worth trying, and seeing if it speeds join up.
--
Steven
More information about the Python-list
mailing list