String concatenation benchmarking weirdness
sg552 at hotmail.co.uk
Fri Jan 11 21:51:27 CET 2013
On 11/01/2013 20:16, Ian Kelly wrote:
> On Fri, Jan 11, 2013 at 12:03 PM, Rotwang <sg552 at hotmail.co.uk> wrote:
>> Hi all,
>> the other day I 2to3'ed some code and found it ran much slower in 3.3.0 than
>> 2.7.2. I fixed the problem but in the process of trying to diagnose it I've
>> stumbled upon something weird that I hope someone here can explain to me.
>> [stuff about timings]
>> Is my guess correct? If not, what is going on? If so, is it possible to
>> explain to a programming noob how the interpreter does this?
> Basically, yes. You can find the discussion behind that optimization at:
> It knows when there are other references to the string because all
> objects in CPython are reference-counted. It also works despite your
> attempts to "fool" it because after evaluating the first operation
> (which is easily optimized to return the string itself in both cases),
> the remaining part of the expression is essentially "x = TOS + 'a'",
> where x and the top of the stack are the same string object, which is
> the same state the original code reaches after evaluating just the x.
> The stated use case for this optimization is to make repeated
> concatenation more efficient, but note that it is still generally
> preferable to use the ''.join() construct, because the optimization is
> specific to CPython and may not exist for other Python
The slowdown in my code was caused by a method that built up a string of
bytes by repeatedly using +=, before writing the result to a WAV file.
My fix was to replaced the bytes string with a bytearray, which seems
about as fast as the rewrite I just tried with b''.join. Do you know
whether the bytearray method will still be fast on other implementations?
I have made a thing that superficially resembles music:
More information about the Python-list