On 2016-01-29 5:00 AM, Stefan Behnel wrote:
Yury Selivanov schrieb am 27.01.2016 um 19:25:
[..]
LOAD_METHOD looks at the object on top of the stack, and checks if the name resolves to a method or to a regular attribute. If it's a method, then we push the unbound method object and the object to the stack. If it's an attribute, we push the resolved attribute and NULL.
When CALL_METHOD looks at the stack it knows how to call the unbound method properly (pushing the object as a first arg), or how to call a regular callable.
This idea does make CPython faster around 2-4%. And it surely doesn't make it slower. I think it's a safe bet to at least implement this optimization in CPython 3.6.
So far, the patch only optimizes positional-only method calls. It's possible to optimize all kind of calls, but this will necessitate 3 more opcodes (explained in the issue). We'll need to do some careful benchmarking to see if it's really needed. I implemented a similar but simpler optimisation in Cython a while back:
http://blog.behnel.de/posts/faster-python-calls-in-cython-021.html
Instead of avoiding the creation of method objects, as you proposed, it just normally calls getattr and if that returns a bound method object, it uses inlined calling code that avoids re-packing the argument tuple. Interestingly, I got speedups of 5-15% for some of the Python benchmarks, but I don't quite remember which ones (at least raytrace and richards, I think), nor do I recall the overall gain, which (I assume) is what you are referring to with your 2-4% above. Might have been in the same order.
That's great! I'm still working on the patch, but so far it looks like adding just LOAD_METHOD/CALL_METHOD (that avoid instantiating BoundMethods) gives us 10-15% faster method calls. Combining them with my opcode cache makes them 30-35% faster. Yury