Re: [Python-Dev] Re: opcode performance measurements

1 Feb 2002

      [Jeremy Hylton]
...
Thanks for good questions and suggestions.  Too bad you can't come to
dev day.  I'll try to post slides before or after the talk -- and
update the PEP.
Here are some more wild ideas, probably more thought provoking
than useful, but this is really an area where only the profiler knows
the truth <wink>.
...
SP> And Python with modules, data-objects, class/instances, types
  SP> etc is quite a zoo :(.
And, again, this is a problem.  The same sorts of techniques apply to
all namespaces.  It would be good to try to make the approach
general, but some namespaces are more dynamic than others.  Python's
classes, lack of declarations, and separate compilation of modules
means class/instance namespaces are hard to do right.  Need to defer a
lot of final decisions to runtime and keep an extra dictionary around
just in case.
* instance namespaces

As I said but what eventually will happen with class/type unification plays
a role.

1. __slots__ are obviously a good thing here :)
2. old-style instances  and in general instances with a dict:

one can try to guess the slots of a class looking for the "self.attr"
pattern at compile time in a more or less clever way.
The set of compile-time guessed attrs will be passed to MAKE_CLASS
which will construct the runtime guess using the union of the
super-classes guesses and the compile time guess for the class.
This information can be used to layout a dlict.

* zoo problem
[yes as I said this whole inline cache thing is supossed
to trade memory with speed. And the fact that python
internal objects are so inhomogeneous/ polymorphic <wink>
does not help to keep the amount small, for example
having only new-style classes would help]

ideally one can assign to each bytecode in a codeobject
whose behavior depends/dispatchs on the concrete object "type"
a "cache line" (or many, polymorphic inline caches
for modern Smalltalk impl does that in the context of the jit)
(As long as the GIL is there we do not need per-thread
version of the caches)

the first entries in the "cache-line" could contain the PyObject type
and then a function pointer, so the we would have a common
logic like:

  if PyObjectType(obj) == cache_line.type:
     cache_line.onType()
  else:
     ...

then the per-type code could use the rest of the space in cache-line
polymorphically to contain type-specific cached "dispatch" info.
E.g. the index of a dict entry for the load_attr/set_attr logic on an instance
...

Abstractly  one can think about a cache-line for a bytecode as
the streamlined version in terms of values/or code-pointers of the
last time taken path for that bytecode, plus values to check whether
the very same path still makes sense.

1. in practice these ideas can perform very poorly
2. this try to address things/internals as they are,

3. Yup, anything on the object layout/behavior
   side that simplifies this picture probably does a step
   in the right direction.

regards, Samuele.

Re: [Python-Dev] Re: opcode performance measurements

Samuele Pedroni