[Python-3000] Bound and unbound methods

Mon Aug 14 04:27:57 CEST 2006

Talin wrote:
> Anyway, I just wanted to throw that out there. Feel free to -1 away... :)

Based on the later discussion, I see two interesting possibilities:

1. A special CALL_METHOD opcode that the compiler emits when it spots the 
".NAME(ARGS)" pattern. This could simply be an optimisation performed by the 
bytecode emitter when processing an AST Call node with an Attribute node as 
the "func" subnode (it would need to poke around inside the Attribute node, 
rather than generating the Attribute node's code normally, though). For 
functions, this opcode could bypass __get__ and invoke __call__ directly with 
the right arguments. Put the actual optimisation into PyObject_CallMethod and 
call that from the new opcode, and more than just the eval loop would benefit.

This could also be done by the addition of a MethodCall AST node, and an 
AST->AST optimizing pass that took the Call+Attribute node and merged them 
into a single MethodCall node (The concrete parser can't look far enough ahead 
to figure out that a given attribute access is part of a method call).

Option 1 is focused on the speedup Talin mentioned. Aside from the downside of 
additional complexity in the code generation phase, I don't see any real 
downside - __get__ will only be bypassed when the interpreter *knows* what the 
descriptor would do.

2. Rewrite the __get__ methods on functions, classmethod and staticmethod to 
cache the resulting method object in the class dictionary or instance 
dictionary. This would entail making method objects descriptors that returned 
a bound copy of themselves when retrieved through an instance. That way, for 
methods that are never called, the method objects are never created, but for 
methods that are used, the method object is created only once. Something would 
need to be done to make this work for object's without an instance dictionary 
- those could either continue to not cache their instance methods, they could 
have a lazily initialized __dict__ pointer that is instantiated the first time 
it is needed instead of no dict at all (yay, attributes on object() 
instances!), or else there could be an id() keyed cache internal to the 
interpreter.

I personally would favour the option of making __dict__ available by default 
(i.e. put that behaviour in object), with no caching occurring if the object 
had no __dict__ attribute at all. Tuples and the numeric types could continue 
not to support attributes (as allocating space for an extra pointer would be a 
big size increase for them in their general usage pattern, and they don't 
generally have methods that are called from Python), while the other builtin 
types would acquire a usable __dict__ attribute (which may not be instantiated 
until the first time it is needed, although if instance methods get cached, it 
would be needed most of the time, so the extra complexity of lazy 
initialization may not be worth it).

The interesting benefit of option 2 is that "assert list.append is 
list.append" would now succeed, as would "s = []; assert s.append is 
s.append". "assert [].index is [].index" would still fail though, as different 
instances would get their own bound methods.

The downside of option 2 is that it is slightly more likely to break stuff due 
to the changes in semantics, and that it is a case of a genuine space-speed 
tradeoff - this approach *will* use more memory than the current approach, 
because bound method objects are always allocated permanently instead of being 
ephemeral things.

OTOH, if you did both option 1 and option 2, the caching would occur only if 
you retrieved a method without calling it immediately, and be bypassed most of 
the time.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org