Speed of Nested Functions & Lambda Expressions

beginner zyzhu2000 at gmail.com
Wed Oct 24 17:38:30 CEST 2007


On Oct 24, 2:52 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> beginner <zyzhu2... at gmail.com> wrote:
> > It is really convenient to use nested functions and lambda
> > expressions. What I'd like to know is if Python compiles fn_inner()
> > only once and change the binding of v every time fn_outer() is called
> > or if Python compile and generate a new function object every time. If
> > it is the latter, will there be a huge performance hit? Would someone
> > give some hint about how exactly Python does this internally?
>
> You can use Python's bytecode disassembler to see what actually gets
> executed here:
>
> >>> def fn_outer(v):
>
>     a=v*2
>     def fn_inner():
>         print "V:%d,%d" % (v,a)
>
>     fn_inner()
>
> >>> import dis
> >>> dis.dis(fn_outer)
>
>   2           0 LOAD_DEREF               1 (v)
>               3 LOAD_CONST               1 (2)
>               6 BINARY_MULTIPLY    
>               7 STORE_DEREF              0 (a)
>
>   3          10 LOAD_CLOSURE             0 (a)
>              13 LOAD_CLOSURE             1 (v)
>              16 BUILD_TUPLE              2
>              19 LOAD_CONST               2 (<code object fn_inner at
> 01177218, file "<pyshell#3>", line 3>)
>              22 MAKE_CLOSURE             0
>              25 STORE_FAST               1 (fn_inner)
>
>   6          28 LOAD_FAST                1 (fn_inner)
>              31 CALL_FUNCTION            0
>              34 POP_TOP            
>              35 LOAD_CONST               0 (None)
>              38 RETURN_VALUE        
>
>
>
> When you execute the 'def' statement, the two scoped variables a and v
> are built into a tuple on the stack, the compiled code object for the
> inner function is also pushed onto the stack and then the function is
> created by the 'MAKE_CLOSURE' instruction. This is then stored in a
> local variable (STORE_FAST) which is then loaded and called.
>
> So the function definition is pretty fast, BUT notice how fn_inner is
> referenced by STORE_FAST/LOAD_FAST whereas a and v are referenced by
> LOAD_DEREF/STORE_DEREF and LOAD_CLOSURE.
>
> The code for fn_inner also uses LOAD_DEREF to get at the scoped
> variables:
>
>   4           0 LOAD_CONST               1 ('V:%d,%d')
>               3 LOAD_DEREF               1 (v)
>               6 LOAD_DEREF               0 (a)
>               9 BUILD_TUPLE              2
>              12 BINARY_MODULO      
>              13 PRINT_ITEM          
>              14 PRINT_NEWLINE      
>              15 LOAD_CONST               0 (None)
>              18 RETURN_VALUE        
>
> (its a bit harder to disassemble that one, I stuck a call to dis.dis
> inside fn_outer to get that)
>
> If you do some timings you'll find that LOAD_DEREF/STORE_DEREF are
> rather slower than LOAD_FAST/STORE_FAST, so while the overhead for
> creating the function is minimal you could find that if you access the
> variables a lot (even in fn_outer) there may be a measurable slow-down.
>
> If timings show that it is a code hotspot then you might find it better
> to nest the function but pass any required values in as parameters (but
> if you don't have evidence for this just write whatever is clearest).


Thanks for the detailed analysis, Duncan. Also thanks for showing how
the disassembler can be used to figure this out. I was just looking
for a tool like this. This is great. Thanks again.




More information about the Python-list mailing list