[Python-Dev] optimizing non-local object access

Samuele Pedroni Samuele Pedroni <pedroni@inf.ethz.ch>
Thu, 9 Aug 2001 20:50:33 +0200 (MET DST)


>   SP> - find a way to detect some conservative approximation of the
>   SP>   set of slots of a concrete class
> 
> I think this can be done with better compiler support.
> 

The simplest thing would be to detect all 'attr' that appear in the body of
the methods of a class statements in 'self.attr' form.  Class construction at
run-time will take the union of this together with the information computed
for base classes Why at runtime?  It avoids the trouble of
poor-man-to-full-fledged type-inference for the compiler.  It would be
reasonably effective and at most would require the programmer to put a
self.myFavoriteAttr = None in __init__.

>   SP> - use customization (consumes memory) so you can exploit a fixed
>   SP>   layout for the approximated slots
>   SP> - use plain dictionary lookup for the slots missed that way
> 
>   Right 
Clearly this should be played two-hands with method dispatch.

> 
>   SP> Not impossible, not trivial and surely a thing that will pollute
>   SP> code clarity, OTOH whith this kind of optimizations in place,
>   SP> even just at the interp level, effective native compilation is
>   SP> not that far ...
> 
> It is a lot of work for instances, so I've been considering it
> separately from the module globals issue.  But I think it is
> possible.  Short of native-code, I wonder if we can extend the Python
> VM with opcodes that take advantage of slots.

It is indeed a good idea, but the problem is that customising without native
compilation can have a very poor compilation-time vs.  runtime speedup
ratio.  OTOH we could customize only hot-spots.  But I think is worth trying
if we ever get at it.

>  I'm not sure how
> closely this idea is related to Pysco.

AFAIK it would be complementary, Armin Rigo pointed out that in principle
Psyco can layout objects at will, but I remarked that is way of dealing
with code comsumes a lot of memory so it is workable only if applied only to
hot spots, and then we pay for conversion when non-Pysco code calls Psyco code.

Psyco is doing some sort of customization (a generalisation of that), a
layout for instances that can be exploited in term of speed under
customization is a good thing for Psyco too IMHO.

> Imagine we didn't special case ints in BINARY_ADD.  If you're doing
> mostly string addition, it's just slowing you down anyway :-).
> Instead, a function that expected to see ints would check the types of
> the relevant args.
> 
> def f(x):
>     return 2*x
> 
> would become something like:
> 
> def f(x):
>     if isinstance(x, int):
>         return 2 * x_as_int
>     else:
>         return 2 * x
> 
> where the int branch could use opcodes that knew that x was an integer
> and could call its __add__ method directly.
AFAIK this is somehow what Psyco does, I have not checked the code but my understanding is that
it goes a hard way to do this kind of thing specifically when they can give you something.


Customization would mean the following:

class A:
  def __init__(self, a=2):
    self.a = a
    
  def double(self):
    self.a*=2
    
class B:
 def __init__(self, b=3):
   self.b = b
   
 def inc(self):
   self.b+=1
   
class C(A,B):
  def __init__(self,a,b):
    A.__init__(a)
    B.__init__(b)
    
  def mul(self):
    return a*b
    
under customization if both instances of A and C are used and in both cases
double is used there will be two versions of double to choose from, one that
expect self to be of concrete class A and one that expect it to be of
concrete class C.  Both these versions can be constructed knowing the exact
place of the slot a in the objects.

It works clearly best with good pure single dispatch oo code ;).

In any case both a more static layout for slots and some way to add hooks
that count how many times a code object is invoked, would be a good start 
point for a lot of possible nice experiments.

regards, Samuele.