[Python-Dev] accumulator display syntax

Tue Oct 21 00:09:11 EDT 2003

> with a.py having:
> def asum(R):
>     sum([ x*x for x in R ])
> 
> def gen(R):
>     for x in R: yield x*x
> def gsum(R, gen=gen):
>     sum(gen(R))
> 
> I measure:
> 
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.asum(R)'
> 10000 loops, best of 3: 96 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.gsum(R)'
> 10000 loops, best of 3: 60 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.asum(R)'
> 1000 loops, best of 3: 930 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.gsum(R)'
> 1000 loops, best of 3: 590 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.asum(R)'
> 100 loops, best of 3: 1.28e+04 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.gsum(R)'
> 100 loops, best of 3: 8.4e+03 usec per loop
> 
> not sure why gsum's advantage ratio over asum seems to be roughly
> constant, but, this IS what I measure!-)

Great!  This is a plus for iterator comprehensions (we need a better
term BTW).  I guess that building up a list using repeated append()
calls slows things down more than the frame switching used by
generator functions; I knew the latter was fast but this is a pleasant
result.

BTW, if I use a different function that calculates list() instead of
sum(), the generator version is a few percent slower than the list
comprehension.  But that's because list(a) has a shortcut in case a is
a list, while sum(a) always uses PyIter_Next().  So this is actually
consistent: despite the huge win of the shortcut, the generator
version is barely slower.

I think the answer lies in the bytecode:

>>> def lc(a):
     return [x for x in a]

>>> import dis
>>> dis.dis(lc)
  2           0 BUILD_LIST               0
              3 DUP_TOP             
              4 LOAD_ATTR                0 (append)
              7 STORE_FAST               1 (_[1])
             10 LOAD_FAST                0 (a)
             13 GET_ITER            
        >>   14 FOR_ITER                16 (to 33)
             17 STORE_FAST               2 (x)
             20 LOAD_FAST                1 (_[1])
             23 LOAD_FAST                2 (x)
             26 CALL_FUNCTION            1
             29 POP_TOP             
             30 JUMP_ABSOLUTE           14
        >>   33 DELETE_FAST              1 (_[1])
             36 RETURN_VALUE        
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        
>>> def gen(a):
     for x in a: yield x

>>> dis.dis(gen)
  2           0 SETUP_LOOP              18 (to 21)
              3 LOAD_FAST                0 (a)
              6 GET_ITER            
        >>    7 FOR_ITER                10 (to 20)
             10 STORE_FAST               1 (x)
             13 LOAD_FAST                1 (x)
             16 YIELD_VALUE         
             17 JUMP_ABSOLUTE            7
        >>   20 POP_BLOCK           
        >>   21 LOAD_CONST               0 (None)
             24 RETURN_VALUE        
>>> 

The list comprehension executes 7 bytecodes per iteration; the
generator version only 5 (this could be more of course if the
expression was more complicated than 'x').  The YIELD_VALUE does very
little work; falling out of the frame is like falling off a log; and
gen_iternext() is pretty sparse code too.  On the list comprehension
side, calling the list's append method has a bunch of overhead.  (Some
of which could be avoided if we had a special-purpose opcode which
called PyList_Append().)

But the executive summary remains: the generator wins because it
doesn't have to materialize the whole list.

--Guido van Rossum (home page: http://www.python.org/~guido/)