[Python-Dev] Speeding up instance attribute access
Guido van Rossum
Fri, 08 Feb 2002 14:18:20 -0500
(By mistake I sent an incomplete version of this earlier. Please
ignore that and read this instead.)
Inspired by the second half of Jeremy's talk on DevDay, here's my
alternative approach for speeding up instance attribute access. Like
my idea for globals, it uses double indirection rather than
- We only care about attributes of 'self' (which is identified as the
first argument of a method, not by name). We can exclude functions
from our analysis that make any assignment to self -- this is
extremely rare and would throw off our analysis. We should also
exclude static methods and class methods, since their first argument
doesn't have the same role.
- Static analysis of the source code of a class (without access to the
base class) can determine attributes of the class, and to some extent
instance variables. Without also analyzing the base classes, this
analysis cannot reliably distinguish between instance variables and
methods inherited from a base class; it can distinguish between
instance variables and methods defined in the current class.
- We can guess the status of un-assigned-to inherited attributes by
seeing whether they are called or not. This is not 100% accurate,
so we need things to work (if slower) even when we guess wrong.
- For instance variable references and stores of the form self.<name>,
the bytecode compiler emits opcodes LOAD_SELF_IVAR <i> and
STORE_SELF_IVAR <i>, where <i> is a small int identifying the
instance variable (ivar). A particular ivar is identified by the
same <i> throughout all methods defined in the same class statement,
but there is no attempt to coordinate this across different classes
related by inheritance.
- Some data structure describing the mapping from <i> to attribute
name, and whether it's an ivar or a method, is produced by the
compiler and stored in the class, in a way that a user can't change
it. The function objects representing methods also contain a
pointer to this data structure. (Or the code objects? But it needs
to be shared. Details, details.)
- At *run time*, when a class object is created, another data
structure is created that accumulates the <i>-to-name mappings from
that class and all its base classes. This has more accurate
information because it can collect information from the bases
(though it can still be fooled by dynamic manipulations of classes).
In particular, it has the canonical set of known instance variables
for instances of this class. This is stored in the class object, in
a way that a user can't change it.
- When an instance is created, the run-time data structure stored in
the class is consulted to know how many instance variables to
allocate. An array is allocated at the end of the instance (or in a
separately allocated block?) with one PyObject pointer for each
known instance variable. There's also a pointer to the instance
__dict__, to hold instance variables that the parser didn't spot,
but this pointer starts off as NULL -- a dictionary is created for
it only when needed.
- There is no requirement that the layout of the ivar array for
instances of a subclass is an extension of the layout of the ivar
array for its base classes. But there *is* a requirement that all
instances of the same class that are not instances of a subclass
(i.e., all x and y where x.__class__ is y.__class__) have the same
layout, and this layout is determined by the run-time data structure
stored in the class.
- Now all we need is an efficient way to map LOAD_SELF_IVAR <i> to an
index in the array of ivars. Two different classes play a role
here: the class of self (the run-time class) and the class that
defined the method containing the LOAD_SELF_IVAR <i> opcode (the
compile-time class). We assume the run-time class is a subclass of
the compile-time class. The correct mapping can easily be
calculated from the data structures left behind by the compiler in
the compile-time class and by class construction at run time in the
run-time class. Since this mapping doesn't change, it can be
calculated once and then cached. The cache can be held in the
run-time class; it can be a dictionary whose keys are compile-time
classes. This means a single dict lookup that must be done once per
method call (and only when LOAD_SELF_IVAR <i> or STORE_SELF_IVAR <i>
is used). We could save even that dict lookup in most cases by
caching the run-time class and the outcome with the method.
- We need fallbacks for various exceptional cases:
* If the compile-time class uses LOAD_SELF_IVAR <i> but the run-time
class doesn't think that is an instance variable, LOAD_SELF_IVAR
must fall back to look in the instance dict (if non-NULL) and then
down the run-time class and its base classes.
* If the ivar slot in the instance corresponding to <i> exists but
is NULL, LOAD_SELF_IVAR <i> must fall back to searching the
run-time class and its base classes.
For both cases, the mapping from <i> to attribute names must be
available. The language and the current code generation guarantee
that 'self' is the first local variable.
- The instance __dict__ must become a proxy that knows whether a given
name is stored in the array of ivars or in the overflow dict; this
is much like Jeremy's DLict.
- Note that there are two sources of savings: the major savings
(probably) comes from avoiding a dict lookup for every ivar access;
an additional minor savings comes from collapsing two opcodes:
LOAD_FAST 0 (self)
LOAD_ATTR 1 (foo)
LOAD_SELF_IVAR 0 (self.foo)
- I don't know if we should try to generate LOAD_SELF_IVAR <i> only
for things that really are (likely) ivars, or for all attributes.
Maybe it would be nice if we also had a single-opcode way to express
a method call on self, e.g. CALL_SELF_METHOD <i>, <n>, <k> where <i>
identifies the method, and <n> and <k> are the number of positional
and keyword arguments. Or maybe we should just have
LOAD_SELF_METHOD <i> which looks in the instance overflow dict (but
only if non-NULL) and in the class and bases, but can avoid looking
in the ivars array if <i> does not describe a known ivar (again this
information can be cached).
- The required global analysis is a bit hairy, and not something we
already do. I believe that Jeremy thinks PyChecker already does
this; I'm not sure if that means we can borrow code or just ideas.
--Guido van Rossum (home page: http://www.python.org/~guido/)