[pypy-dev] Re: Base Object library (was: stdobjspace status)

Thu Feb 27 11:44:35 CET 2003

Hello Christian,

Whoow, I am afraid this thread is going in some wrong direction!  As Paolo
points out, this is getting confused.  There are already several different
levels at which things are working, and Stephan and you are adding yet another
one.  I don't say this is necessarily wrong, but things are not very clear.

The W_IntObject class itself is supposed to be one possible implementation of 
Python's integers, described in a high-level point of view.  The whole point 
is in that word "high-level".  We (or at least I) want the objspace/std/* 
files to be translatable to whatever actual low-level language that we like, 
and not only C -- this is why the bitfield metaphor is not necessarily a good 
thing.

When you are talking about r_int and similarly r_float or even r_long 
wrappers, you want to make it clear that you mean a "machine-level" plain 
object.  But actually that was the whole point of RPython.  The C translation 
of all the interpreter-level code will produce machine-level "int"s.  In other 
words there is no deep reason to use a special wrapping mecanism for ints, 
floats, lists, whatever, to implement the W_XxxObject classes, because we will 
need *exactly the same* low-level objects to implement just *everything else* 
in the interpreter!  For example, when the interpreter manipulates the current 
bytecode position, it manipulates what looks like Python integers and strings 
---

   opcode = ord(self.bytecode[nextposition])
   nextposition += 1

--- but we want this code to be translated into efficient C code, too.  The 
least thing we want is to have to rewrite *all* this code with r_ints!

So the *only* point of r_ints, as far as I can tell, is to have explicit
control over bit width and overflow detection.  There is no built-in Python
type "integer with n bits" that generates the correct OverflowErrors.  And
there is no unsigned integer type, hence the need for r_uint.  But apart from
that, just use plain integers and it is fine!  That is the whole purpose of
writing our pypy in Python, isn't it?  Creating r_long, r_float, r_list...  
looks like we are reinventing our own language.

The confusion that may have led to this point is that we never explicitely
said what objects we would allow in RPython.  Remember, RPython is supposed to
be Python with a few restrictions whose goal is *only* to ease the job of type
inference and later translation.  But the *existing* high-level constructs of 
Python must be kept!  If we don't use them, we are back again at the C level 
and there is no point in writing pypy in Python.

A couple of more specific points now...

On Wed, Feb 26, 2003 at 07:46:12PM +0100, Christian Tismer wrote:
> So, W_ListObject
> would have some fields like
>     self.objarray    # r_objectarray
>     self.maxlen      # r_uint
>     self.actlen      # r_uint
> 
> It has been suggested to not use maxlen, since
> we could use len(self.objarray), but I believe
> this is wrong to do. If we are modelling primitive
> arrays, then they don't support len() at all.

I feel exactly the opposite.  Just *use* real Python lists all the way,
including most of their normal operations.  This is nice because they describe
high-level operations.  In my view a W_ListObject would have only two fields:

 * self.objarray    # a real Python list of wrapped items
 * self.length      # the actual list length

with self.length <= len(self.objarray).  The self.objarray list grows by more 
than one item at a time, to amortize the cost of adding a single element at a 
time, just like what the current CPython implementation does.  All this works 
fine, is normal pure Python code, and can be expected to translate to an 
efficient C implementation in which lists are just translated into malloc'ed 
blocks of PyObject* pointers.  The point is that we still have a nice Pythonic 
high-level implementation of W_ListObject but nevertheless can translate it 
into an efficient low-level implementation.

Another point I would like to raise is that we don't have to give types to 
every variable in pypy, like enforcing that a slot contains an r_int.  This is 
just unPythonic.  *All* types are expected to be computed during type 
inference.  So don't even say that W_ListObject.objarray is a list of wrapped 
objects or Nones -- this can be figured out automatically.

..from Paolo's reply:

> In other words... is not the *W_IntObject with r_int* one of the *possible*
> choice that pypy can choose for representing (to target code emission!) a
> plain python int (c)?
> Instead of r_int, for example, I can choose to use a tagged-rapresentation
> of an int, and write a t_int that check the overflow and so problems (and
> where to check arithmetics?)...(d).

This is right.  Consider for example the case of *two* different
implementations (that would both appear as having the 'int' type to users).  
Say that one is like Christian's W_IntObject+r_int, and the other one can only
encode small, tagged integers.  The choice to use one or both representations
in an actual C implementation must be made by the RPython-to-C translator, and
not in the object space.  For example, if we want to produce something very
similar to the current CPython, then we have no use for small, tagged
integers.  The question is thus, "how do we express things to allow for this?"

Similarily, we may provide different implementations for lists, dictionaries,
whatever; we may even consider that Python's "long" type is an unneeded hack,
for long integers could be just another implementation for the 'int' type,
which goes very much in the direction that Python seems to go with the recent
automatic conversions of overflows to longs.

The original intent of classes like W_IntObject was "one class, one
implementation", and I think that we must stick to that idea because these
classes are what are used for the multiple dispatch routines.  I don't have a
clear and complete answer for the rest of the question "how do we express
things to allow for this?".  I hope that this e-mail has clarified some
points.  Disagreement is welcome.  I apologize to Christian and Stephan
because it seems that we might have to reorganize the xxxobject.py sources,
althought I'm not sure yet how.

In an effort to go in that direction I'd like to add that nothing has been 
done yet about:

 * built-in methods (like list.append); the StdObjSpace.xxx.register() trick
   only works for built-in operators
 * non-built-in operators and methods, e.g.
   implementing something like long_long_add in application-space
   (longobject_app.py).

A bientôt,

Armin.