
On 21.01.2016 14:19, Victor Stinner wrote:
2016-01-21 10:39 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
I ran performance tests on these optimization tricks (and others) in 2014. See this talk:
http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-m... (slides 33ff.)
Ah nice, thanks for the slides.
Forgot to mention the benchmarks I used: https://github.com/egenix/when-performance-matters
The keyword trick doesn't really pay off in terms of added performance vs. danger of introducing weird bugs.
I ran a quick microbenchmark to measure the cost of LOAD_GLOBAL to load a global: call func("abc") with
mylen = len def func(obj): return mylen(obj)
Result:
117 ns: original bytecode (LOAD_GLOBAL) 109 ns: LOAD_CONST 116 ns: LOAD_CONST with guard
LOAD_CONST avoids 1 dict lookup (globals) and reduces the runtime by 8 ns: 7% faster. But the guard has a cost of 7 ns: we only win 1 nanosecond. Not really interesting here.
LOAD_CONST means that the LOAD_GLOBAL instruction has been replaced with a LOAD_CONST instruction. The guard checks if the frame globals and globals()['mylen'] didn't change.
I ran a second microbenchmark on func("abc") to measure the cost LOAD_GLOBAL to load a builtin: call func("abc") with
def func(obj): return len(obj)
Result:
124 ns: original bytecode (LOAD_GLOBAL) 107 ns: LOAD_CONST 116 ns: LOAD_CONST with guard on builtins + globals
LOAD_CONST avoids 2 dict lookup (globals, builtins) and reduces the runtime by 17 ns: 14% faster. But the guard has a cost of 9 ns: we win 8 nanosecond, 6% faster.
Here is the guard is more complex: checks if the frame builtins, the frame globals, builtins.__dict__['len'] and globals()['len'] didn't change.
If you avoid guards, it's always faster, but it changes the Python semantics.
The speedup on such very small example is low. It's more interesting when the global or builtin variable is used in a loop: the speedup is multipled by the number of loop iterations.
Sure, but for those, you'd probably simply use the in-function localization: def f(seq): z = 0 local_len = len for x in seq: if x: z += local_len(x) return z This results in a LOAD_FAST inside the loop and is probably the better way to speed things up.
A decorator could help with this (by transforming the byte code and localizing the symbols), e.g.
@localize(len) def f(seq): z = 0 for x in seq: if x: z += len(x) return z
FYI https://pypi.python.org/pypi/codetransformer has such decorator: @asconstants(len=len).
Interesting :-)
All that said, I don't really believe that this is a high priority feature request. The gained performance win is not all that great and only becomes relevant when used in tight loops.
Yeah, in the Python stdlib, the hack is only used for loops.
Right. The only advantage I'd see in having a keyword to "configure" the behavior is that you could easily apply the change to a whole module/function without having to add explicit localizations everywhere. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 21 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/