[Python-Dev] optimizing non-local object access

Samuele Pedroni Samuele Pedroni <pedroni@inf.ethz.ch>
Thu, 9 Aug 2001 19:39:33 +0200 (MET DST)

[Jeremy Hylton]
> I've been thinking about the same issues in my spare time.  I
> certainly think the interpreter could be made faster by assuming that
> most bindings are static most of the time.  Python has many places
> where bindings can change unexpectedly, but most of the time they
> don't change.
> My worry about your approach is that the track-object opcodes could
> add a lot of expense for objects only used once or twice.  If the
> function uses math.sin inside a loop, it's an obvious win.  If it uses
> it only once, it's not so clear.
I agree, and I wonder whether it is not going to explode (in term of effort and resources)
down a call graph.

> The other approach I had been thinking about, which may have made it
> into my pre-PEP post a few months ago, is to store module globals in a
> table that can be accessed like fast locals.  Any reference to another
> module would use the fast globals table.  For references to global
> names in another module, the calling module could lookup the offset of
> that name in the defining module's fast globals.
I rembemer I liked it.

> I have been thinking some about how to make a similar strategy work
> for instances.  I think we could apply some lessons from the Self
> compiler and recompile generic bytecodes ("load the 'foo' attribute")
> into more specific bytecodes ("load slot 12").  But this is more
> speculation than anything else.
Some considerations about this:

Python vs. Smalltalk vs. Self

Smalltalk classes have rigid sets of slots, single inheritance and all slot are private (protected?),
so I imagine it is easy to number them.

In Self the set of slots of an object is also fixed (there are specific reflective methods to modify the 
set of slots) and also which are mutable etc. OTOH Self has dynamic inheritance (something like
the mutable __bases__ in Python). To deal with all this it uses customization (different version of
a single method are lazily compiled against the various concrete structural variations of objects that 
inherit it) and assume that things found up in the inheritance tree are constant but guards itself
about their changes

Python is even less rigid than Self and there is not an explicit notion of the set of slots that goes with
an object. So to achieve some improvement in slot access speed may be necessary to:
- set more rigid rules on slots (but then that's not Python anymore)

- find a way to detect some conservative approximation of the set of slots of a concrete class
- use customization (consumes memory) so you can exploit a fixed layout for the approximated slots
- use plain dictionary lookup for the slots missed that way

Not impossible, not trivial and surely a thing that will pollute code clarity, OTOH
whith this kind of optimizations in place, even just at the interp level, effective native
compilation is not that far ...

regards, Samuele Pedroni.