[Python-ideas] Optimizing global names via dict references
Andrew Barnert
abarnert at yahoo.com
Sat Dec 19 13:30:57 EST 2015
On Dec 19, 2015, at 08:33, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
>
> I messed up the links. Well, I learned something about mailman.
>
> On Sat, Dec 19, 2015 at 10:06 AM, Victor Stinner
> <victor.stinner at gmail.com> wrote:
>> Le samedi 19 décembre 2015, Franklin? Lee <leewangzhong+python at gmail.com> a
>> écrit :
>>>
>>> == Problem statement ==
>>>
>>> I propose a CPython interpreter optimization for global name lookups.
>>
>>
>> Fans of micro optimisation are probably already using various hacks like
>> func(len=len): ... to avoid the lookup at runtime. There is an option
>> rewriting bytecode to replace load_global with load_const (I don't recall
>> the name of the PyPI project, it's a decorator).
>>
>> Serhiy also proposed to implement a new syntax to make the lookup when the
>> function is defined.
>
> I mentioned the hack, and said this would make it unnecessary. I also
> mentioned how the hack bypasses dynamic lookup for the sake of
> performance, while this would have the effect of dynamic lookup. It
> could possibly be faster (since you don't have to load a default arg).
On Dec 19, 2015, at 08:33, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
>
> I mentioned the hack, and said this would make it unnecessary. I also
> mentioned how the hack bypasses dynamic lookup for the sake of
> performance, while this would have the effect of dynamic lookup. It
> could possibly be faster (since you don't have to load a default arg).
It can't possibly be faster, unless you never actually use the len function (in which case it would be pretty stupid to optimize with len=len).
Loading a default value is just copying from one array to another at call time, with pseudocode like "frame.localsplus[3] = function.defaults[1]". That takes about as much time as two locals lookups (and then turns every builtin lookup into a local lookup).
Meanwhile, your idea replaces every locals lookup with a RefCell lookup plus a dereference, something like "frame.code.consts[1].value" instead of "frame.localsplus[3]". Loading off consts is about as fast as loading off locals, but that extra cell dereference step can't possibly be free, and in fact will be on the same order as the array lookup.
As a rough guess, since your RefCell system is pretty close to what already happens for closure cells, it'll probably be about as fast as closure lookups--which are significantly faster than global lookups, meaning the optimization would help, but not as fast as local lookups, meaning the optimization would not help as much as the default value hack.
And that seems like an almost unavoidable cost of keeping it dynamic.
The FAT idea avoids that cost by faking the dynamic nature: it caches the value as a const, but the cache is wiped out if the value changes and the guard is tripped, so in cases where you don't ever change len it's nearly as fast the default value hack, but if you do change len it's as slow as a traditional global lookup (because, in fact, it's just falling back to the unoptimized code that does a traditional global lookup). And, since you asked about PyPy, I'd be willing to bet that what it does is a lot closer to FAT than to your idea.
> It would improve all functions which use globals, and doesn't require
> the programmer to bypass dynamic lookup. No new syntax > new syntax.
> Not having to use a hack > using a hack.
But more complex under-the-covers implementation < simple implementation. Unless it actually significantly speeds up a reasonable amount of real-life code whose speed matters, that's not worth it.
I suppose one way you could estimate the potential benefit is to try to figure out how common the default hack is in real life. If people need to use it frequently in production code (which presumably means there's other production code that could be benefitting from it but the programmer didn't realize it), then making it unnecessary is a real win.
>> Since the goal is a speedup, a working proof of concept is required to show
>> that it works and it's faster (on macro benchmarks?). Do you feel able to
>> implement it?
>
> It would be a project. I've studied some of the dict code, but I'd
> still have to look at OrderedDict and how it differs. I'd have to hack
> at the interpreter level. Worst, I'm not very experienced at C, and
> have no real "development" experience, so I'd struggle with just
> compiling CPython. But I'll definitely try, because I won't get
> anywhere by hoping someone else implements what I think of.
Something you might want to consider doing first:
Write a pure-python RefCell, RefCellDict, and RefCellChainDict. Write some code that explicitly uses a ChainMap to look things up instead of using globals and builtins. Then rewrite it as equivalent code that manually sets up a list of RefCells out of a RefCellChainDict and then uses the list. You can simulate "global" and "builtin" lookups and mutations, and see when the optimization makes things faster. If it often makes a big difference, that should inspire you to do the hard work in C.
Also, it means you have real code instead of pseudocode to show off and let people play with.
In fact, you could even write a pure-Python optimizer out of this without inspecting the bytecode: for each def/lambda/etc. node, compile the AST normally, get the co_names, then rewrite the AST to build and use a list of RefCells, then compile the module and replace its dict with a RefCellDict, and now all globals in the module use RefCell lookups. (Making that work for builtins might be trickier, because other modules can change that... But for a first pass, you can either leave builtins out, or assume that no code changes it after startup and throw in some asserts for tests that break that assumption.) That would allow you to run realistic benchmarks. (If you're worried that the pure-Python objects cells and dicts are too slow, Cython should help there.) And it might even be a useful project on its own if it could optimize real-life code in CPython 3.5 and 2.7, PyPy, Jython, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20151219/f2b2f9a6/attachment-0001.html>
More information about the Python-ideas
mailing list