[Python-ideas] Explicit variable capture list

Victor Stinner victor.stinner at gmail.com
Thu Jan 21 08:19:59 EST 2016


2016-01-21 10:39 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
> I ran performance tests on these optimization tricks (and
> others) in 2014. See this talk:
>
> http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-matters/
> (slides 33ff.)

Ah nice, thanks for the slides.


> The keyword trick doesn't really pay off in terms of added
> performance vs. danger of introducing weird bugs.

I ran a quick microbenchmark to measure the cost of LOAD_GLOBAL to
load a global: call func("abc") with

   mylen = len
   def func(obj): return mylen(obj)

Result:

117 ns: original bytecode (LOAD_GLOBAL)
109 ns: LOAD_CONST
116 ns: LOAD_CONST with guard

LOAD_CONST avoids 1 dict lookup (globals) and reduces the runtime by 8
ns: 7% faster. But the guard has a cost of 7 ns: we only win 1
nanosecond. Not really interesting here.

LOAD_CONST means that the LOAD_GLOBAL instruction has been replaced
with a LOAD_CONST instruction. The guard checks if the frame globals
and globals()['mylen'] didn't change.


I ran a second microbenchmark on func("abc") to measure the cost
LOAD_GLOBAL to load a builtin: call func("abc") with

   def func(obj): return len(obj)

Result:

124 ns: original bytecode (LOAD_GLOBAL)
107 ns: LOAD_CONST
116 ns: LOAD_CONST with guard on builtins + globals

LOAD_CONST avoids 2 dict lookup (globals, builtins) and reduces the
runtime by 17 ns: 14% faster. But the guard has a cost of 9 ns: we win
8 nanosecond, 6% faster.

Here is the guard is more complex: checks if the frame builtins, the
frame globals, builtins.__dict__['len'] and globals()['len'] didn't
change.


If you avoid guards, it's always faster, but it changes the Python semantics.

The speedup on such very small example is low. It's more interesting
when the global or builtin variable is used in a loop: the speedup is
multipled by the number of loop iterations.


> A decorator could help with this (by transforming the byte
> code and localizing the symbols), e.g.
>
> @localize(len)
> def f(seq):
>     z = 0
>     for x in seq:
>        if x:
>            z += len(x)
>     return z

FYI https://pypi.python.org/pypi/codetransformer has such decorator:
@asconstants(len=len).


> All that said, I don't really believe that this is a high
> priority feature request. The gained performance win is
> not all that great and only becomes relevant when used
> in tight loops.

Yeah, in the Python stdlib, the hack is only used for loops.

Victor


More information about the Python-ideas mailing list