[Python-Dev] accumulator display syntax
Guido van Rossum
guido at python.org
Tue Oct 21 00:09:11 EDT 2003
> with a.py having:
> def asum(R):
> sum([ x*x for x in R ])
>
> def gen(R):
> for x in R: yield x*x
> def gsum(R, gen=gen):
> sum(gen(R))
>
> I measure:
>
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.asum(R)'
> 10000 loops, best of 3: 96 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.gsum(R)'
> 10000 loops, best of 3: 60 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.asum(R)'
> 1000 loops, best of 3: 930 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.gsum(R)'
> 1000 loops, best of 3: 590 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.asum(R)'
> 100 loops, best of 3: 1.28e+04 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.gsum(R)'
> 100 loops, best of 3: 8.4e+03 usec per loop
>
> not sure why gsum's advantage ratio over asum seems to be roughly
> constant, but, this IS what I measure!-)
Great! This is a plus for iterator comprehensions (we need a better
term BTW). I guess that building up a list using repeated append()
calls slows things down more than the frame switching used by
generator functions; I knew the latter was fast but this is a pleasant
result.
BTW, if I use a different function that calculates list() instead of
sum(), the generator version is a few percent slower than the list
comprehension. But that's because list(a) has a shortcut in case a is
a list, while sum(a) always uses PyIter_Next(). So this is actually
consistent: despite the huge win of the shortcut, the generator
version is barely slower.
I think the answer lies in the bytecode:
>>> def lc(a):
return [x for x in a]
>>> import dis
>>> dis.dis(lc)
2 0 BUILD_LIST 0
3 DUP_TOP
4 LOAD_ATTR 0 (append)
7 STORE_FAST 1 (_[1])
10 LOAD_FAST 0 (a)
13 GET_ITER
>> 14 FOR_ITER 16 (to 33)
17 STORE_FAST 2 (x)
20 LOAD_FAST 1 (_[1])
23 LOAD_FAST 2 (x)
26 CALL_FUNCTION 1
29 POP_TOP
30 JUMP_ABSOLUTE 14
>> 33 DELETE_FAST 1 (_[1])
36 RETURN_VALUE
37 LOAD_CONST 0 (None)
40 RETURN_VALUE
>>> def gen(a):
for x in a: yield x
>>> dis.dis(gen)
2 0 SETUP_LOOP 18 (to 21)
3 LOAD_FAST 0 (a)
6 GET_ITER
>> 7 FOR_ITER 10 (to 20)
10 STORE_FAST 1 (x)
13 LOAD_FAST 1 (x)
16 YIELD_VALUE
17 JUMP_ABSOLUTE 7
>> 20 POP_BLOCK
>> 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
>>>
The list comprehension executes 7 bytecodes per iteration; the
generator version only 5 (this could be more of course if the
expression was more complicated than 'x'). The YIELD_VALUE does very
little work; falling out of the frame is like falling off a log; and
gen_iternext() is pretty sparse code too. On the list comprehension
side, calling the list's append method has a bunch of overhead. (Some
of which could be avoided if we had a special-purpose opcode which
called PyList_Append().)
But the executive summary remains: the generator wins because it
doesn't have to materialize the whole list.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list