Proposal for attribute and method caching (was: One Python 2.1 idea)

Tue Dec 26 22:06:14 EST 2000

The paper by Lowis[1] confirms that attribute and method
lookup consume a large part of a Python program's execution.
His 38% reduction in execution time is attractive, even if
its measurement is muddy.

However, I think he is taking the wrong approach. He uses
class caches based on intern'd string ID. This requires
fairly significant, additional data structures, whose
maintenance somewhat offsets the gains from caching. It
works for a small, fixed number of method names, losing
its advantage for large programs, where it is needed most.
His approach to instance attributes speeds up only the
failed lookup that is required when a method is invoked.
It does not speed attribute reference generally, which is
the more frequent case.

Python's semantics are beautiful and subtle. Method and
attribute caching need to be local, lightweight, and
tailored to Python. I've been thinking about this problem,
and propose a caching scheme below. This scheme should
eliminate dictionary lookups for the vast majority of
attribute and method references. Attributes are tackled
prior to methods. The scheme requires one or two new
primitive types, but the new data structures required are
local, and relatively lightweight. I explain the scheme in
three steps.

Two caveats. First, I'm putting this idea out, hoping
it sparks discussion for possible performance improvements
to Python. As much as I would love to take a few weeks to
experiment with this, my time is pretty heavily committed
for the next year. Second, I am no expert on how Python
now is implemented. I think I understand Python semantics
pretty well. Still .. I may be missing something, in which
case I expect those who know more than me to cover the
ideas below with well-thrown tomatoes.

STEP 1: LIGHTEN INSTANCE DICTIONARIES

In Python, every class instance has a dictionary of its
attributes that are built up at run time. This means
instances are relatively heavy-weight, compared to
statically bound OO languages such as C++. (In C++, an
instance for a complex number type typically stores
nothing more than two real numbers.) This first step
slightly modifies how instances are represented. It does
NOT improve performance generally, but is a precursor to
the next step.

Define a new Mapping type called a light dictionary.
This type has exactly the same exposed interface as the
dictionary, except for its constructor and a few additional
operations available to the Python machine. A light
dictionary is implemented by three attributes that are
NOT exposed. They are available only to the C code that
implements the type. These are:

  indexDict: A dictionary mapping n strings into range(n)
  indexedVal: A static array [0:n-1] of Python objects
  extraDict: A dictionary, initially None

At first glance, a light dictionary looks a good deal
heavier than an ordinary dictionary, since it includes two
of them as attributes. But the indexDict is a constant
dictionary, created at compile time, shared by any number
of light dictionaries. And the extraDict typically is not
used. The index dictionary holds the "expected" set of
names. For a class instance, this is all instance
attributes that can be found in the class definition,
or from any inherited class definitions. (Conditioned or
not.) Attribute values are stored in the indexedVal array:

    __dict__.indexedVal[indexDict["foo"]] = value of x.foo

The index dictionary that initialize an instance's light
dictionary is a class attribute: __attrIndexDict__. It is
created when the class is defined, it is NOT exposed, and
it never changes after it is created. It is the most
complex internal data structure whose addition I propose.

Light dictionaries support a few operations available only
to the Python machine. I will use Python code to describe
their behavior, and later, show how they are used. But
these are NOT methods. They should be implemented directly
in the Python machine, and run much faster than methods.

    def getIndex(self, key):
        if indexDict:
            return self.indexDict.get(key, None)

    def getByIndex(self, index):
        if self.indexDict:
            if type(self.indexedVal[index])!=UnassignedType:
                return self.indexedVal[index]
            else:
                raise AttributeError
        else:
            raise <some exception>

    def setByIndex(self, index, value):
        if self.indexDict:
            self.indexedVal[index] = value
        else:
            raise <some exception>

    def isIndexable(self):
        if self.indexDict:
            return 1

Getting values from and setting values in the indexedVal
array should be true, C index operations.  getByIndex()
must throw an exception if the indexed element is unassigned.
There are several ways to do this. I would suggest adding
another primitive type, Unassigned.

Using a light dictionary for __dict__, object instances
are a little more like the instances of static languages
such as C++. The initial representation of the instance's
attributes are a reference to a class dictionary, and an
array to hold the attributes' future values.

A light dictionary mimics exactly the behavior of a
dictionary. A new key that is not an intern'd string is put
into extraDict. Key lookup is done in indexDict, for
intern'd strings, and in extraDict, for other strings.
Deleting one of the keys in indexDict is done by setting
the indexed element of indexedVal to Unassigned. Deleting
a key from extraDict is done normally. A new key that is an
intern'd string copies all the index keys and values into
extraDict, sets indexDict to None, and then adds the new
key to extraDict. The light dictionary then is no longer
indexable.

The performance of a light dictionary is little different
from that of an ordinary dictionary. Except for a couple of
tests, the exposed operations are turned immediately into
the same operation on indexDict or extraDict. There is a
small gain on first assignment to the indexed keys, and a
penalty when a new intern'd string is added.

Importantly, a light dictionary has the same semantics
and exposes the same interface as an ordinary dictionary.
Anything the Python programmer does with x.__dict__ he can
still do, with the same effect. He can't reach the class's
__attrIndexDict__, because it is strictly an internal data
structure, and the internal attributes of a reference
dictionary are not exposed.

2. CACHE ATTRIBUTE REFERENCES

The big win from light dictionaries is that attribute
reference can be made a pure indexing operation, for
attributes that are determined when the class is defined.
Consider a typical attribute reference:

    y = x.foo

The byte code holds the string "foo" via a reference:

    _0: "foo"

We're going to add three other pieces of information, all
either references or integers:

    _0: "foo"
    _1: x.__class__, initially set to None
    _2: -1, reserved for methods
    _3: foo's index in x's reference dictionary

Of course, x's class might be different for different
executions of this code fragment. Local caching derives its
benefit from the fact that most times, x has the same class
as the previous pass through the same code. To retrieve the
attribute, the Python machine performs an operation that
has the following behavior:

    if x.__dict__.isIndexable:
        if x.__class__!=_1:     # New class
            _1 = x.__class__
            _3 = x.__dict__.getIndex("foo")
        if _3:
            y = x.__dict__.getByIndex(_3)
        else:
            <search other namespaces>
    else:                       # Polluted instance namespace
        if y = x.__dict__.has_key("foo"):
            y = x.__dict__.get("foo")
        else:
            <search other name spaces>

In the good case, three comparisons and an indexing operation
return the attribute value. On the first time through, the
class's dictionary must be queried. Once the instance's
dictionary is "polluted" with intern'd strings other than its
known attributes, behavior is exactly as it is now, with the
extra overhead of a comparison. If the good case is the
majority of the time, attribute references will be made MUCH
faster. A similar operation, but using setByIndex(), is used
when writing to an attribute.

The common case where this caching technique fails is when
a superclass's method is applied to instances of different
subtypes. It might be possible to optimize a bit further, by
making each class's __attrIndexDict__ an extension of its
superclass's. A little twiddling would restore caching for
"inherited" attributes, even when every instance is a
different subtype. I haven't thought about this further
optimization much, and it might not work for multiple
inheritance.

There is one last "gotcha" to handle. In Python, you can
change the class of an instance by assigning the __class__
attribute. When __class__ changes, __dict__ must be forced
non-indexable, as if an intern'd string were added to it.
After this, __dict__ behaves exactly like a normal
dictionary, because it holds a normal dictionary to which
it delegates all methods.

3. CACHE METHOD REFERENCES

There are two problems faced by any attempt to cache Python
methods. First, instance attributes (including functions)
may override defined methods. Second, class methods can be
changed on the fly.

The algorithm below caches only methods whose names (a) are
intern'd strings, and (b) are not predetermined attribute
names, i.e., don't appear in the class's __attrIndexDict__.
Methods that are so qualified are marked in a way that can
be inspected by the Python machine. Abusing notation, I
will indicate this through an attribute on the method:

    C.method.__qualified__

Given this qualification of methods, the first problem can
be solved by making sure no new intern'd names have been
added to the instance's namespace. Fortunately, this is
exactly what x.__dict__.isIndexable() does.

To solve the second problem, a new attribute is added to
class objects: __methodChanges__. It counts the number of
times that the methods of the class OR ITS SUBCLASSES have
changed. Initialized to zero, whenever a method is changed
in a class, this attribute is incremented, all the way up
the class hierarchy.

    x.meth(..)

The typical method invocation, shown above, uses a constant
name. As with attributes, this name is contained via
reference in the byte code. It is extended in a fashion
that is compatible with attribute references:

    _0: "meth"
    _1: x.__class__, initially None
    _2: the last value of x.__class__.__methodChanges__
    _3: the last value of "meth", i.e., a method pointer

To find the method, the Python machine performs the
following operation:

    if (x.__class__==_1)
    and x.__dict__.isIndexable \
    and (x.__class__.__methodChanges__==_2):
        method = _3
    else:
        <apply existing algorithm to set method>
        if method.__qualified__:
            _1 = x.__class__
            _2 = _1.__methodChanges__
            _3 = method

In the good case, the method is already cached, and it
is verified with just a few comparisons. Note that
__methodChanges__, like __class__ should be implemented
more directly than through a namespace dictionary.

[1] The Lowis paper is at:
http://www.foretec.com/python/workshops/1998-11/proceedings.html

Russell

Sent via Deja.com
http://www.deja.com/