[Python-Dev] Speeding up instance attribute access

Fri, 08 Feb 2002 14:18:20 -0500

(By mistake I sent an incomplete version of this earlier.  Please
ignore that and read this instead.)

Inspired by the second half of Jeremy's talk on DevDay, here's my
alternative approach for speeding up instance attribute access.  Like
my idea for globals, it uses double indirection rather than
recompilation.

- We only care about attributes of 'self' (which is identified as the
  first argument of a method, not by name).  We can exclude functions
  from our analysis that make any assignment to self -- this is
  extremely rare and would throw off our analysis.  We should also
  exclude static methods and class methods, since their first argument
  doesn't have the same role.

- Static analysis of the source code of a class (without access to the
  base class) can determine attributes of the class, and to some extent
  instance variables.  Without also analyzing the base classes, this
  analysis cannot reliably distinguish between instance variables and
  methods inherited from a base class; it can distinguish between
  instance variables and methods defined in the current class.

- We can guess the status of un-assigned-to inherited attributes by
  seeing whether they are called or not.  This is not 100% accurate,
  so we need things to work (if slower) even when we guess wrong.

- For instance variable references and stores of the form self.<name>,
  the bytecode compiler emits opcodes LOAD_SELF_IVAR <i> and
  STORE_SELF_IVAR <i>, where <i> is a small int identifying the
  instance variable (ivar).  A particular ivar is identified by the
  same <i> throughout all methods defined in the same class statement,
  but there is no attempt to coordinate this across different classes
  related by inheritance.

- Some data structure describing the mapping from <i> to attribute
  name, and whether it's an ivar or a method, is produced by the
  compiler and stored in the class, in a way that a user can't change
  it.  The function objects representing methods also contain a
  pointer to this data structure.  (Or the code objects?  But it needs
  to be shared.  Details, details.)

- At *run time*, when a class object is created, another data
  structure is created that accumulates the <i>-to-name mappings from
  that class and all its base classes.  This has more accurate
  information because it can collect information from the bases
  (though it can still be fooled by dynamic manipulations of classes).
  In particular, it has the canonical set of known instance variables
  for instances of this class.  This is stored in the class object, in
  a way that a user can't change it.

- When an instance is created, the run-time data structure stored in
  the class is consulted to know how many instance variables to
  allocate.  An array is allocated at the end of the instance (or in a
  separately allocated block?) with one PyObject pointer for each
  known instance variable.  There's also a pointer to the instance
  __dict__, to hold instance variables that the parser didn't spot,
  but this pointer starts off as NULL -- a dictionary is created for
  it only when needed.

- There is no requirement that the layout of the ivar array for
  instances of a subclass is an extension of the layout of the ivar
  array for its base classes.  But there *is* a requirement that all
  instances of the same class that are not instances of a subclass
  (i.e., all x and y where x.__class__ is y.__class__) have the same
  layout, and this layout is determined by the run-time data structure
  stored in the class.

- Now all we need is an efficient way to map LOAD_SELF_IVAR <i> to an
  index in the array of ivars.  Two different classes play a role
  here: the class of self (the run-time class) and the class that
  defined the method containing the LOAD_SELF_IVAR <i> opcode (the
  compile-time class).  We assume the run-time class is a subclass of
  the compile-time class.  The correct mapping can easily be
  calculated from the data structures left behind by the compiler in
  the compile-time class and by class construction at run time in the
  run-time class.  Since this mapping doesn't change, it can be
  calculated once and then cached.  The cache can be held in the
  run-time class; it can be a dictionary whose keys are compile-time
  classes.  This means a single dict lookup that must be done once per
  method call (and only when LOAD_SELF_IVAR <i> or STORE_SELF_IVAR <i>
  is used).  We could save even that dict lookup in most cases by
  caching the run-time class and the outcome with the method.

- We need fallbacks for various exceptional cases:

  * If the compile-time class uses LOAD_SELF_IVAR <i> but the run-time
    class doesn't think that is an instance variable, LOAD_SELF_IVAR
    must fall back to look in the instance dict (if non-NULL) and then
    down the run-time class and its base classes.

  * If the ivar slot in the instance corresponding to <i> exists but
    is NULL, LOAD_SELF_IVAR <i> must fall back to searching the
    run-time class and its base classes.

  For both cases, the mapping from <i> to attribute names must be
  available.  The language and the current code generation guarantee
  that 'self' is the first local variable.

- The instance __dict__ must become a proxy that knows whether a given
  name is stored in the array of ivars or in the overflow dict; this
  is much like Jeremy's DLict.

- Note that there are two sources of savings: the major savings
  (probably) comes from avoiding a dict lookup for every ivar access;
  an additional minor savings comes from collapsing two opcodes:

      LOAD_FAST       0 (self)
      LOAD_ATTR       1 (foo)

  into one:

      LOAD_SELF_IVAR  0 (self.foo)

- I don't know if we should try to generate LOAD_SELF_IVAR <i> only
  for things that really are (likely) ivars, or for all attributes.
  Maybe it would be nice if we also had a single-opcode way to express
  a method call on self, e.g. CALL_SELF_METHOD <i>, <n>, <k> where <i>
  identifies the method, and <n> and <k> are the number of positional
  and keyword arguments.  Or maybe we should just have
  LOAD_SELF_METHOD <i> which looks in the instance overflow dict (but
  only if non-NULL) and in the class and bases, but can avoid looking
  in the ivars array if <i> does not describe a known ivar (again this
  information can be cached).

- The required global analysis is a bit hairy, and not something we
  already do.  I believe that Jeremy thinks PyChecker already does
  this; I'm not sure if that means we can borrow code or just ideas.

--Guido van Rossum (home page: http://www.python.org/~guido/)