
I have an idea that may result in significant optimization of python. However, I have an incomplete understanding of what's involved, and I haven't had enough time to puzzle out enough of Python's internals to write up a full PEP describing this. Method calls appear are a full 20% slower (simple benchmark included) than function calls, and Python function calls are already pretty slow. By my understanding, one of the reasons for the difference is that if you have a method call like this: a = A() a.b() what's really happening is something along the lines of: temp = new.instancemethod(getattr(a.__class__, "b"), a, A) temp() free(temp) This causes an unnecessary memory allocation: since the instancemethod object is immediately being created, then called, then garbage collected. Looking at the output of dis.dis, I can see there are also 3 bytecodes being evaluated rather than 1. My proposal is to treat method calls as syntactically different from function calls. Rather than x.y() being completely synonymous with getattr(x, "y")(), it could be analogous to 'x.y = z' or 'del x.y'. For symmetry with these statement types, the new bytecode could be called CALL_ATTR. I think this is an important thing to consider as systems like Zope and Twisted move towards using component models and Interfaces in Python. The fact that direct function calls are so much faster puts efficiency directly at odds with structured flexibility. With a method call primitive comparable to function calls, most python code, especially in systems that make heavy use of inter-object communication patterns, would immediately get as much as a 15% speed boost. CALL_ATTR should be implementable with no impact on existing python code, except bytecode hacks. It should be possible to retain a fully backwards-compatible __getattr__ method, for places where method objects are used (including the C API). Likewise, the default __callattr__ could be set up to first check if __getattr__ is defined, then the instance's dictionary or __slots__. For additional speed gains, new-object-model classes could set '__fast_methods__ = True' and gain a semantic distinction between __getattr__ and __callattr__. Better still, I think that Jython could use the subtle semantic change to make Java reflection less expensive. (Java's `new' is more expensive than C's `malloc', after all.) I have a sneaking suspicion that this would also be good for security purposes. I haven't yet come up with a specific case where this is a big deal, but I think capability-style data-hiding would be simplified if filtering method-calls were different from filtering attributes. I hope this idea is useful to some of you, -- | <`'> | Glyph Lefkowitz: Traveling Sorcerer | | < _/ > | Lead Developer, the Twisted project | | < ___/ > | http://www.twistedmatrix.com | import time class A: def b(self): pass def b(self): pass a = A() def wallclock(f): then = time.time() f() now = time.time() elapsed = now - then return elapsed NCALLS = 1000000 def methods(): for x in xrange(NCALLS): a.b() def functions(): for x in xrange(NCALLS): b(a) print wallclock(methods) print wallclock(functions) % python2.0 methods.py 1.85362696648 1.47611200809 % python2.1 methods.py 1.21098303795 0.972733020782 % python2.2 methods.py 1.15857589245 0.914402961731 % jython methods.py 63.396000027656555 51.51300001144409

Glyph Lefkowitz <glyph@twistedmatrix.com> writes:
I have an idea that may result in significant optimization of python. However, I have an incomplete understanding of what's involved, and I haven't had enough time to puzzle out enough of Python's internals to write up a full PEP describing this.
Gotta run, but: interesting! Would be more interesting if it was implemented :-) This is a roundabout way of saying that I don't think implementation will be entirely straightforward/there may be easier ways to speed up method calls. More later. Cheers, M. -- Strangely enough I saw just such a beast at the grocery store last night. Starbucks sells Javachip. (It's ice cream, but that shouldn't be an obstacle for the Java marketing people.) -- Jeremy Hylton, 29 Apr 1997

Michael Hudson <mwh@python.net> writes:
Would be more interesting if it was implemented :-) This is a roundabout way of saying that I don't think implementation will be entirely straightforward/there may be easier ways to speed up method calls.
Actually, the "easier ways" I had in mind are already implemented :-) Cheers, M. -- If your telephone company installs a system in the woods with no one around to see them, do they still get it wrong? -- Robert Moir, alt.sysadmin.recovery

Yes, this has long been on my list of speed-ups. Does anybody have time to work on it? --Guido van Rossum (home page: http://www.python.org/~guido/)

Glyph Lefkowitz <glyph@twistedmatrix.com> writes:
Method calls appear are a full 20% slower (simple benchmark included) than function calls, and Python function calls are already pretty slow. By my understanding, one of the reasons for the difference is that if you have a method call like this:
a = A() a.b()
what's really happening is something along the lines of:
temp = new.instancemethod(getattr(a.__class__, "b"), a, A) temp() free(temp)
This causes an unnecessary memory allocation: since the instancemethod object is immediately being created, then called, then garbage collected. Looking at the output of dis.dis, I can see there are also 3 bytecodes being evaluated rather than 1.
Can this allocation be avoided? Ahh, by the 'atomic' implementation of the single CALL_ATTR opcode, using a static allocated instancemethod instead of a new one? Is this what you have in mind? Thomas

[Glyph Lefkowitz on __callattr__]
CALL_ATTR should be implementable with no impact on existing python code, except bytecode hacks. It should be possible to retain a fully backwards-compatible __getattr__ method, for places where method objects are used (including the C API). Likewise, the default __callattr__ could be set up to first check if __getattr__ is defined, then the instance's dictionary or __slots__. For additional speed gains, new-object-model classes could set '__fast_methods__ = True' and gain a semantic distinction between __getattr__ and __callattr__.
Better still, I think that Jython could use the subtle semantic change to make Java reflection less expensive. (Java's `new' is more expensive than C's `malloc', after all.)
For the record, jython already have invoke(...) (what you call __callattr__) and uses it to optimize certain simple method calls on instances. It is not currently used to optimize calls into java reflection. Oh, as a cute aside, when I tried to run your timing program, I realized that the optimization had been disabled by mistake in CVS during the AST parse tree rewrite. Thanks for your help in pointing that out. Without the "invoke" optimization, I get these numbers: 2.865000009536743 1.7020000219345093 With the optimization added, I get this: 2.052999973297119 1.652999997138977 regards, finn

My proposal is to treat method calls as syntactically different from function calls. Rather than x.y() being completely synonymous with getattr(x, "y")(), it could be analogous to 'x.y = z' or 'del x.y'. For symmetry with these statement types, the new bytecode could be called CALL_ATTR.
I'm not sure exactly what you're proposing here. I'm all for making the bytecode compiler recognize the special case <object>.<name>(<args>), and emitting a special bytecode for it that can be executed more efficiently in the common case. But I want the semantic *definition* to be unchanged: it should really mean the same thing as gettattr(<object>, "<name>")(<args). This may limit the possibilities for optimization, but any change in semantics for something as fundamental as this is going to break too much stuff. Besides, I think that semantic definition is exactly right. Here's how I think the CALL_ATTR opcode should work: if <obj> is a new instance: if the class's getattr policy is standard: if <name> not in <obj>.__dict__: search <obj>.__class__ and its base classes for <name> if found and the result is a Python function: call it with arguments (<obj>, <args>) and return elif <obj> is a classic instance: if <name> not in <obj>.__dict__: search <obj>.__class__ and its base classes for <name> if found and the result is a Python function: call it with arguments (<obj>, <args>) and return # if we get here, the optimization doesn't apply tmp = getattr(<obj>, <name>) return tmp(<args>) --Guido van Rossum (home page: http://www.python.org/~guido/)

I thought one thing Glyph was trying to address was the single-use nature of the instancemethod object. Once it's been created, can't you just cache it in the instance for later reuse? When it's needed, you borrow it from the instance, use it, then put it back (assuming the slot in the instance is still empty, otherwise you DECREF it). I think that would make it thread safe. Skip

I thought one thing Glyph was trying to address was the single-use nature of the instancemethod object. Once it's been created, can't you just cache it in the instance for later reuse? When it's needed, you borrow it from the instance, use it, then put it back (assuming the slot in the instance is still empty, otherwise you DECREF it). I think that would make it thread safe.
The caching would have to be done by the instance object's getattr() implementation; there are all sorts of situations where the cache would have to be invalidated and only the instance (really, its class) knows about that. And note that the compiler doesn't know the type of <obj> in <obj>.<name>(<args>). <obj> might be a module or an extension object. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Friday, February 14, 2003, at 12:15 PM, Skip Montanaro wrote:
I thought one thing Glyph was trying to address was the single-use nature of the instancemethod object. Once it's been created, can't you just cache it in the instance for later reuse? When it's needed, you borrow it from the instance, use it, then put it back (assuming the slot in the instance is still empty, otherwise you DECREF it). I think that would make it thread safe.
But this would merely be trading space for speed, where neither is necessary. What I'm proposing (and I think what Guido already had on his list) is that the instancemethod object never be created, as you can optimize the syntax to mean "call this method" rather than "get this attribute, call the result".

[Guido van Rossum]
Here's how I think the CALL_ATTR opcode should work:
Which is basicly what Jython does (except that jython does not yet have new-style classes). Jython also does a tiny bit of handling of non-functions. Maybe that is usefull here too.
if <obj> is a new instance: if the class's getattr policy is standard: if <name> not in <obj>.__dict__: search <obj>.__class__ and its base classes for <name> if found and the result is a Python function: call it with arguments (<obj>, <args>) and return elif <obj> is a classic instance: if <name> not in <obj>.__dict__: search <obj>.__class__ and its base classes for <name> if found:
if the result is a Python function: call it with arguments (<obj>, <args>) and return else: bind the non-function and call with (<args>)
# if we get here, the optimization doesn't apply tmp = getattr(<obj>, <name>) return tmp(<args>)
regards, finn
participants (7)
-
Finn Bock
-
Glyph Lefkowitz
-
Guido van Rossum
-
Michael Hudson
-
Neil Schemenauer
-
Skip Montanaro
-
Thomas Heller