[pypy-dev] Objects and types in the stdobjspace

Mon Jun 9 13:09:58 CEST 2003

Hello Rocco,

On Sun, Jun 08, 2003 at 11:10:52PM -0400, Rocco Moretti wrote:
> But I'm still a little hazy. Could you walk us through how this scheme 
> would work with multiple implementations of the same type and specifics as 
> to how this would be different from the (CPython) current implementation 
> of types being the same as the implementation?

Ok. In CPython the type completely defines the implementation, via the
PyTypeObject structure (tp_size and all the function pointers). All objects
have a type pointer (ob_type), which it a very fast way to retrieve both the
type of the object and its implementation (as it is the same thing).

Ultimately this PyObject structure is something that we will want to have in
*some* versions of PyPy, so that we can be compatible with CPython's extension
modules, at least. I'm confident that we'll be able to do that automatically
based on the current (different) way the stdobjspace works. This is what we
did with multimethods: during the latest sprint we could tie them back to
Python's __add__&co methods, but internally we use multimethods all the way
simply because they are more natural in our context.

The "natural" way to look at the object implementation of the stdobjspace is
that there is only a finite and reasonably small number of different
implementations defined (unlike types, of which the user can create as many as
he wishes). The translator-to-C program can thus implement a wrapped object
with a struct that has some (arbitrary) tag to distinguish between
implementations. In CPython, the tag is the ob_type field. In our case we are
free to choose whatever we want. For example it could be a small integer:  
0=W_IntObject, 1=W_ListObject, 2=W_TupleObject, and so on. It can be
completely arbitrary. It could also be a pointer to some data describing the
implementation. But using a small integer makes multimethod dispatch extremely
fast : to dispatch on objects 'a' and 'b', read 'a->tag' and 'b->tag' and use
them as indices in a (N by N) table of function pointers. It is quite simpler
and faster than CPython's way of playing around with each object's type's
tp_number->nb_add.

That's for the motivation. Now several implementations for the same type can
coexist, provided we give some heuristics to select between them. For example,
suppose we have two string implementations, a W_StringObject and a
W_ConcatenatedStringObject. Then the concatenation of two W_StringObject
should check if the resulting string would grow larger than some threshold; if
so, instead of building and returning a W_StringObject it builds and returns a
W_ConcatenatedStringObject, enabling whose new algorithms to carry on with the
manipulation of what the user still sees as a plain string.

Choice of implementation can also be done when the object is initially built.  
For example, imagine we had a W_SmallIntObject implementation that can store
an integer of up to 30 bits by abusing a pointer field (e.g. by setting the
last two bits to an odd value, and assuming real pointers are never odd).
There should be (there isn't yet) an implementation of inttype_new() in
inttype.py (for calls to 'int(obj)'). This one should examine the numeric
value it should build and depending on whether it is small enough to fit 30
bits or not, it can build a W_SmallIntObject or a W_IntObject. Actually,
'space.wrap(someinteger)' could also invoke this mechanism.

A bientôt,

Armin.