Performance of lists vs. list comprehensions
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Wed Jan 20 01:12:18 EST 2010
On Wed, 20 Jan 2010 05:25:22 +0100, Alf P. Steinbach wrote:
> * Steven D'Aprano:
>> On Tue, 19 Jan 2010 16:20:42 -0500, Gerald Britton wrote:
>>
>>> That's surprising. I wouldn't implement it that way at all. I'd use a
>>> dynamically-expanding buffer as I suggested. That way you get a
>>> single pass and don't have to calculate anything before you begin. In
>>> the best case, you'd use half the memory (the result just fits in the
>>> buffer after its last expansion and no saved tuple). In the worst
>>> case, the memory use is about the same (you just expanded the buffer
>>> using a 2x expansion rule then you hit the last item).
>>
>> In the worst case, you waste 50% of the memory allocated.
>
> Yes. That is a good argument for not doing the expanding buffer thing.
> But such buffers may be generally present anyway, resulting from
> optimization of "+".
As near as I can determine, the CPython optimization you are referring to
doesn't use the "double the buffer when needed" technique. It operates on
a completely different strategy. As near as I can tell (as a non-C
speaker), it re-sizes the string in place to the size actually needed,
thus reducing the amount of copying needed.
The optimization patch is here:
http://bugs.python.org/issue980695
and some history (including Guido's opposition to the patch) here:
http://mail.python.org/pipermail/python-dev/2004-August/046686.html
Nevertheless, the patch is now part of CPython since 2.4, but must be
considered an implementation-specific optimization. It doesn't apply to
Jython, and likely other implementations.
> Using CPython 2.6.4 in Windows XP:
[snip time measurements]
> Here the linear increase of times indicate that the "+" is being
> optimized using an expanding buffer for the string.
Be careful of jumping to the conclusion from timings that a certain
algorithm is used. All you can really tell is that, whatever the
algorithm is, it has such-and-such big-oh behaviour.
> If only the minimal
> space was allocated each time then one would expect O(n^2) behavior
> instead of the apparently O(n) above.
I don't follow that reasoning.
> Example of that O(n^2) behavior given below.
The example shown demonstrates that the + optimization only applies in
certain specific cases. In fact, it only applies to appending:
s += t
s = s + t
but not prepending or concatenation with multiple strings:
s = t + s
s += t + u
However, keep in mind that the CPython keyhole optimizer will take
something like this:
s += "x" + "y"
and turn it into this:
s += "xy"
which the concatenation optimization does apply to. Optimizations make it
hard to reason about the behaviour of algorithms!
--
Steven
More information about the Python-list
mailing list