Confounded by Python objects

Sun Jul 27 14:04:27 EDT 2008

Steven D'Aprano a écrit :
> On Sat, 26 Jul 2008 18:54:22 +0000, Robert Latest wrote:
> 
>> Here's an interesting side note: After fixing my "Channel" thingy the
>> whole project behaved as expected. But there was an interesting hitch.
>> The main part revolves around another class, "Sequence", which has a
>> list of Channels as attribute. I was curious about the performance of my
>> script, because eventually this construct is supposed to handle
>> megabytes of data. So I wrote a simple loop that creates a new Sequence,
>> fills all the Channels with data, and repeats.
>>
>> Interistingly, the first couple of dozens iterations went satisfactorily
>> quickly (took about 1 second total), but after a hundred or so times it
>> got really slow -- like a couple of seconds per iteration.
>>
>> Playing around with the code, not really knowing what to do, I found
>> that in the "Sequence" class I had again erroneously declared a
>> class-level attribute -- rather harmlessly, just a string, that got
>> assigned to once in each iteration on object creation.
>>
>> After I had deleted that, the loop went blindingly fast without slowing
>> down.
>>
>> What's the mechanics behind this behavior?
> 
> Without actually seeing the code, it's difficult to be sure, but my guess 
> is that you were accidentally doing repeated string concatenation. This 
> can be very slow.
> 
> In general, anything that looks like this:
> 
> s = ''
> for i in range(10000):  # or any big number
>     s = s + 'another string'
> 
> can be slow. Very slow.

But this is way faster:

s = ''
for i in range(10000):  # or any big number
     s += 'another string'

(snip)

> It's harder to stumble across the slow behaviour these days, as Python 
> 2.4 introduced an optimization that, under some circumstances, makes 
> string concatenation almost as fast as using join().

yeps : using augmented assignment (s =+ some_string) instead of 
concatenation and rebinding (s = s + some_string).

> But be warned: join()
> is still the recommended approach. Don't count on this optimization to 
> save you from slow code.
 >
> If you want to see just how slow repeated concatenation is compared to 
> joining, try this:
> 
> 
>>>> import timeit
>>>> t1 = timeit.Timer('for i in xrange(1000): x=x+str(i)+"a"', 'x=""')
>>>> t2 = timeit.Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
>>>>
>>>> t1.repeat(number=30)
> [0.8506159782409668, 0.80239105224609375, 0.73254203796386719]
>>>> t2.repeat(number=30)
> [0.052678108215332031, 0.052067995071411133, 0.052803993225097656]
> 
> Concatenation is more than ten times slower in the example above,

Not using augmented assignment:

 >>> from timeit import Timer
 >>> t1 = Timer('for i in xrange(1000): x+= str(i)+"a"', 'x=""')
 >>> t2 = Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
 >>> t1.repeat(number=30)
[0.07472991943359375, 0.064207077026367188, 0.064996957778930664]
 >>> t2.repeat(number=30)
[0.071865081787109375, 0.061071872711181641, 0.06132817268371582]

(snip)

> And even worse:
> 
>>>> t1.repeat(number=50)
> [2.7190279960632324, 2.6910948753356934, 2.7089321613311768]
>>>> t2.repeat(number=50)
> [0.087616920471191406, 0.088094949722290039, 0.087819099426269531]
> 

Not that worse here:

 >>> t1.repeat(number=50)
[0.12305188179016113, 0.10764503479003906, 0.10605692863464355]
 >>> t2.repeat(number=50)
[0.11200308799743652, 0.10315108299255371, 0.10278487205505371]
 >>>

I'd still advise using the sep.join(seq) approach, but not because of 
performances.