
with a.py having: def asum(R): sum([ x*x for x in R ])
def gen(R): for x in R: yield x*x def gsum(R, gen=gen): sum(gen(R))
I measure:
[alex@lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.asum(R)' 10000 loops, best of 3: 96 usec per loop [alex@lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.gsum(R)' 10000 loops, best of 3: 60 usec per loop [alex@lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.asum(R)' 1000 loops, best of 3: 930 usec per loop [alex@lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.gsum(R)' 1000 loops, best of 3: 590 usec per loop [alex@lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.asum(R)' 100 loops, best of 3: 1.28e+04 usec per loop [alex@lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.gsum(R)' 100 loops, best of 3: 8.4e+03 usec per loop
not sure why gsum's advantage ratio over asum seems to be roughly constant, but, this IS what I measure!-)
Great! This is a plus for iterator comprehensions (we need a better term BTW). I guess that building up a list using repeated append() calls slows things down more than the frame switching used by generator functions; I knew the latter was fast but this is a pleasant result. BTW, if I use a different function that calculates list() instead of sum(), the generator version is a few percent slower than the list comprehension. But that's because list(a) has a shortcut in case a is a list, while sum(a) always uses PyIter_Next(). So this is actually consistent: despite the huge win of the shortcut, the generator version is barely slower. I think the answer lies in the bytecode:
def lc(a): return [x for x in a]
import dis dis.dis(lc) 2 0 BUILD_LIST 0 3 DUP_TOP 4 LOAD_ATTR 0 (append) 7 STORE_FAST 1 (_[1]) 10 LOAD_FAST 0 (a) 13 GET_ITER >> 14 FOR_ITER 16 (to 33) 17 STORE_FAST 2 (x) 20 LOAD_FAST 1 (_[1]) 23 LOAD_FAST 2 (x) 26 CALL_FUNCTION 1 29 POP_TOP 30 JUMP_ABSOLUTE 14 >> 33 DELETE_FAST 1 (_[1]) 36 RETURN_VALUE 37 LOAD_CONST 0 (None) 40 RETURN_VALUE def gen(a): for x in a: yield x
dis.dis(gen) 2 0 SETUP_LOOP 18 (to 21) 3 LOAD_FAST 0 (a) 6 GET_ITER >> 7 FOR_ITER 10 (to 20) 10 STORE_FAST 1 (x) 13 LOAD_FAST 1 (x) 16 YIELD_VALUE 17 JUMP_ABSOLUTE 7 >> 20 POP_BLOCK >> 21 LOAD_CONST 0 (None) 24 RETURN_VALUE
The list comprehension executes 7 bytecodes per iteration; the generator version only 5 (this could be more of course if the expression was more complicated than 'x'). The YIELD_VALUE does very little work; falling out of the frame is like falling off a log; and gen_iternext() is pretty sparse code too. On the list comprehension side, calling the list's append method has a bunch of overhead. (Some of which could be avoided if we had a special-purpose opcode which called PyList_Append().) But the executive summary remains: the generator wins because it doesn't have to materialize the whole list. --Guido van Rossum (home page: http://www.python.org/~guido/)