Confounded by Python objects
Bruno Desthuilliers
bdesth.quelquechose at free.quelquepart.fr
Sun Jul 27 14:04:27 EDT 2008
Steven D'Aprano a écrit :
> On Sat, 26 Jul 2008 18:54:22 +0000, Robert Latest wrote:
>
>> Here's an interesting side note: After fixing my "Channel" thingy the
>> whole project behaved as expected. But there was an interesting hitch.
>> The main part revolves around another class, "Sequence", which has a
>> list of Channels as attribute. I was curious about the performance of my
>> script, because eventually this construct is supposed to handle
>> megabytes of data. So I wrote a simple loop that creates a new Sequence,
>> fills all the Channels with data, and repeats.
>>
>> Interistingly, the first couple of dozens iterations went satisfactorily
>> quickly (took about 1 second total), but after a hundred or so times it
>> got really slow -- like a couple of seconds per iteration.
>>
>> Playing around with the code, not really knowing what to do, I found
>> that in the "Sequence" class I had again erroneously declared a
>> class-level attribute -- rather harmlessly, just a string, that got
>> assigned to once in each iteration on object creation.
>>
>> After I had deleted that, the loop went blindingly fast without slowing
>> down.
>>
>> What's the mechanics behind this behavior?
>
> Without actually seeing the code, it's difficult to be sure, but my guess
> is that you were accidentally doing repeated string concatenation. This
> can be very slow.
>
> In general, anything that looks like this:
>
> s = ''
> for i in range(10000): # or any big number
> s = s + 'another string'
>
> can be slow. Very slow.
But this is way faster:
s = ''
for i in range(10000): # or any big number
s += 'another string'
(snip)
> It's harder to stumble across the slow behaviour these days, as Python
> 2.4 introduced an optimization that, under some circumstances, makes
> string concatenation almost as fast as using join().
yeps : using augmented assignment (s =+ some_string) instead of
concatenation and rebinding (s = s + some_string).
> But be warned: join()
> is still the recommended approach. Don't count on this optimization to
> save you from slow code.
>
> If you want to see just how slow repeated concatenation is compared to
> joining, try this:
>
>
>>>> import timeit
>>>> t1 = timeit.Timer('for i in xrange(1000): x=x+str(i)+"a"', 'x=""')
>>>> t2 = timeit.Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
>>>>
>>>> t1.repeat(number=30)
> [0.8506159782409668, 0.80239105224609375, 0.73254203796386719]
>>>> t2.repeat(number=30)
> [0.052678108215332031, 0.052067995071411133, 0.052803993225097656]
>
> Concatenation is more than ten times slower in the example above,
Not using augmented assignment:
>>> from timeit import Timer
>>> t1 = Timer('for i in xrange(1000): x+= str(i)+"a"', 'x=""')
>>> t2 = Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
>>> t1.repeat(number=30)
[0.07472991943359375, 0.064207077026367188, 0.064996957778930664]
>>> t2.repeat(number=30)
[0.071865081787109375, 0.061071872711181641, 0.06132817268371582]
(snip)
> And even worse:
>
>>>> t1.repeat(number=50)
> [2.7190279960632324, 2.6910948753356934, 2.7089321613311768]
>>>> t2.repeat(number=50)
> [0.087616920471191406, 0.088094949722290039, 0.087819099426269531]
>
Not that worse here:
>>> t1.repeat(number=50)
[0.12305188179016113, 0.10764503479003906, 0.10605692863464355]
>>> t2.repeat(number=50)
[0.11200308799743652, 0.10315108299255371, 0.10278487205505371]
>>>
I'd still advise using the sep.join(seq) approach, but not because of
performances.
More information about the Python-list
mailing list