[Python-3000] Bound and unbound methods
Nick Coghlan
ncoghlan at gmail.com
Mon Aug 14 04:27:57 CEST 2006
Talin wrote:
> Anyway, I just wanted to throw that out there. Feel free to -1 away... :)
Based on the later discussion, I see two interesting possibilities:
1. A special CALL_METHOD opcode that the compiler emits when it spots the
".NAME(ARGS)" pattern. This could simply be an optimisation performed by the
bytecode emitter when processing an AST Call node with an Attribute node as
the "func" subnode (it would need to poke around inside the Attribute node,
rather than generating the Attribute node's code normally, though). For
functions, this opcode could bypass __get__ and invoke __call__ directly with
the right arguments. Put the actual optimisation into PyObject_CallMethod and
call that from the new opcode, and more than just the eval loop would benefit.
This could also be done by the addition of a MethodCall AST node, and an
AST->AST optimizing pass that took the Call+Attribute node and merged them
into a single MethodCall node (The concrete parser can't look far enough ahead
to figure out that a given attribute access is part of a method call).
Option 1 is focused on the speedup Talin mentioned. Aside from the downside of
additional complexity in the code generation phase, I don't see any real
downside - __get__ will only be bypassed when the interpreter *knows* what the
descriptor would do.
2. Rewrite the __get__ methods on functions, classmethod and staticmethod to
cache the resulting method object in the class dictionary or instance
dictionary. This would entail making method objects descriptors that returned
a bound copy of themselves when retrieved through an instance. That way, for
methods that are never called, the method objects are never created, but for
methods that are used, the method object is created only once. Something would
need to be done to make this work for object's without an instance dictionary
- those could either continue to not cache their instance methods, they could
have a lazily initialized __dict__ pointer that is instantiated the first time
it is needed instead of no dict at all (yay, attributes on object()
instances!), or else there could be an id() keyed cache internal to the
interpreter.
I personally would favour the option of making __dict__ available by default
(i.e. put that behaviour in object), with no caching occurring if the object
had no __dict__ attribute at all. Tuples and the numeric types could continue
not to support attributes (as allocating space for an extra pointer would be a
big size increase for them in their general usage pattern, and they don't
generally have methods that are called from Python), while the other builtin
types would acquire a usable __dict__ attribute (which may not be instantiated
until the first time it is needed, although if instance methods get cached, it
would be needed most of the time, so the extra complexity of lazy
initialization may not be worth it).
The interesting benefit of option 2 is that "assert list.append is
list.append" would now succeed, as would "s = []; assert s.append is
s.append". "assert [].index is [].index" would still fail though, as different
instances would get their own bound methods.
The downside of option 2 is that it is slightly more likely to break stuff due
to the changes in semantics, and that it is a case of a genuine space-speed
tradeoff - this approach *will* use more memory than the current approach,
because bound method objects are always allocated permanently instead of being
ephemeral things.
OTOH, if you did both option 1 and option 2, the caching would occur only if
you retrieved a method without calling it immediately, and be bypassed most of
the time.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list