
2016-01-21 10:39 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
I ran performance tests on these optimization tricks (and others) in 2014. See this talk:
http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-m... (slides 33ff.)
Ah nice, thanks for the slides.
The keyword trick doesn't really pay off in terms of added performance vs. danger of introducing weird bugs.
I ran a quick microbenchmark to measure the cost of LOAD_GLOBAL to load a global: call func("abc") with mylen = len def func(obj): return mylen(obj) Result: 117 ns: original bytecode (LOAD_GLOBAL) 109 ns: LOAD_CONST 116 ns: LOAD_CONST with guard LOAD_CONST avoids 1 dict lookup (globals) and reduces the runtime by 8 ns: 7% faster. But the guard has a cost of 7 ns: we only win 1 nanosecond. Not really interesting here. LOAD_CONST means that the LOAD_GLOBAL instruction has been replaced with a LOAD_CONST instruction. The guard checks if the frame globals and globals()['mylen'] didn't change. I ran a second microbenchmark on func("abc") to measure the cost LOAD_GLOBAL to load a builtin: call func("abc") with def func(obj): return len(obj) Result: 124 ns: original bytecode (LOAD_GLOBAL) 107 ns: LOAD_CONST 116 ns: LOAD_CONST with guard on builtins + globals LOAD_CONST avoids 2 dict lookup (globals, builtins) and reduces the runtime by 17 ns: 14% faster. But the guard has a cost of 9 ns: we win 8 nanosecond, 6% faster. Here is the guard is more complex: checks if the frame builtins, the frame globals, builtins.__dict__['len'] and globals()['len'] didn't change. If you avoid guards, it's always faster, but it changes the Python semantics. The speedup on such very small example is low. It's more interesting when the global or builtin variable is used in a loop: the speedup is multipled by the number of loop iterations.
A decorator could help with this (by transforming the byte code and localizing the symbols), e.g.
@localize(len) def f(seq): z = 0 for x in seq: if x: z += len(x) return z
FYI https://pypi.python.org/pypi/codetransformer has such decorator: @asconstants(len=len).
All that said, I don't really believe that this is a high priority feature request. The gained performance win is not all that great and only becomes relevant when used in tight loops.
Yeah, in the Python stdlib, the hack is only used for loops. Victor