[pypy-dev] Re: Base Object library (was: stdobjspace status)

Christian Tismer tismer at tismer.com
Thu Feb 27 20:05:12 CET 2003


Armin Rigo wrote:
...

> The W_IntObject class itself is supposed to be one possible implementation of 
> Python's integers, described in a high-level point of view.  The whole point 
> is in that word "high-level".  We (or at least I) want the objspace/std/* 
> files to be translatable to whatever actual low-level language that we like, 
> and not only C -- this is why the bitfield metaphor is not necessarily a good 
> thing.

We probably don't need bitfields now.

> When you are talking about r_int and similarly r_float or even r_long 
> wrappers, you want to make it clear that you mean a "machine-level" plain 
> object.

The is no r_float, since floats are just machine-level.
There is also no r_long, since there are no longs at all.

> But actually that was the whole point of RPython.  The C translation 
> of all the interpreter-level code will produce machine-level "int"s.  In other 
> words there is no deep reason to use a special wrapping mecanism for ints, 
> floats, lists, whatever, to implement the W_XxxObject classes, because we will 
> need *exactly the same* low-level objects to implement just *everything else* 
> in the interpreter!  For example, when the interpreter manipulates the current 
> bytecode position, it manipulates what looks like Python integers and strings 
> ---
> 
>    opcode = ord(self.bytecode[nextposition])
>    nextposition += 1
> 
> --- but we want this code to be translated into efficient C code, too.  The 
> least thing we want is to have to rewrite *all* this code with r_ints!

Right, and I didn't hear me saying so.
I just need the r_ints to make the emulation
behave correctly, when we are running it
in CPython!

> So the *only* point of r_ints, as far as I can tell, is to have explicit
> control over bit width and overflow detection.  There is no built-in Python
> type "integer with n bits" that generates the correct OverflowErrors.  And
> there is no unsigned integer type, hence the need for r_uint.  But apart from
> that, just use plain integers and it is fine!  That is the whole purpose of
> writing our pypy in Python, isn't it?  Creating r_long, r_float, r_list...  
> looks like we are reinventing our own language.

Forget about r_long and r_float, I can't remeber
that we talked about that, at least I didn't think of it.

r_list is something where I'm not sure, because
I don't believe that we can create all code by
pure magic. Somebody has to write an implementation
of the future list implementation (which has nothing
to do with r_list but to use it as a building block).
But I believe it is bad to use regular lists and
use things like append(), or list1 + list2.
This is not primitive and neither available in the
C or Assembly targets, so why may we assume they exist?

Only for that reason, I wanted to create a subtype
of lists, which is just not capable to do certain
things.

> The confusion that may have led to this point is that we never explicitely
> said what objects we would allow in RPython.  Remember, RPython is supposed to
> be Python with a few restrictions whose goal is *only* to ease the job of type
> inference and later translation.  But the *existing* high-level constructs of 
> Python must be kept!  If we don't use them, we are back again at the C level 
> and there is no point in writing pypy in Python.

Then you should become clearer about what exactly we
want to use from Python.
...

>>So, W_ListObject
>>would have some fields like
>>    self.objarray    # r_objectarray
>>    self.maxlen      # r_uint
>>    self.actlen      # r_uint
>>
>>It has been suggested to not use maxlen, since
>>we could use len(self.objarray), but I believe
>>this is wrong to do. If we are modelling primitive
>>arrays, then they don't support len() at all.
> 
> 
> I feel exactly the opposite.  Just *use* real Python lists all the way,
> including most of their normal operations.  This is nice because they describe
> high-level operations.  In my view a W_ListObject would have only two fields:
> 
>  * self.objarray    # a real Python list of wrapped items
>  * self.length      # the actual list length
> 
> with self.length <= len(self.objarray).  The self.objarray list grows by more 
> than one item at a time, to amortize the cost of adding a single element at a 
> time, just like what the current CPython implementation does.

Sounds inconsequent to me. On the one hand you use the
length attribute, so the thing cannot be translated
into a pointer to a plain memory area, but it is more.
On the other hand, you repeat the C implementation of
piecewise allocation.
Where do you set the fence?

> All this works 
> fine, is normal pure Python code, and can be expected to translate to an 
> efficient C implementation in which lists are just translated into malloc'ed 
> blocks of PyObject* pointers.  The point is that we still have a nice Pythonic 
> high-level implementation of W_ListObject but nevertheless can translate it 
> into an efficient low-level implementation.

Being explicit about actual length, but implicit
about maximum length is a bit arbitrary for me.
I thought to describe a piece of memory by using
a list. It is a piece of memory, that malloc returns.
So I thought I need to model length as well.

If I don't model length, but rely on a code generator
to figure this out, well, then I have the problem
that this code generator does not exist yet, but I want
to be able to generate code, as a proof of concept.
For that reason, I liked an implementation that is
so basic, that even I could write a code generator,
without the need for any magic that I don't understand.

Kinda bottom-up way, maybe this is the wrong way,
but I need to be convinced. At the moment, I feel
completely blocked.

> Another point I would like to raise is that we don't have to give types to 
> every variable in pypy, like enforcing that a slot contains an r_int.  This is 
> just unPythonic.  *All* types are expected to be computed during type 
> inference.  So don't even say that W_ListObject.objarray is a list of wrapped 
> objects or Nones -- this can be figured out automatically.

You make the one false assumption that all the code
is correct. The reason why I'd like to restrict
the objarray, or especially the array used as the
stack in frames, is better debugging.
I was just told about the fact that we currently
sometimes have "wrong" objects on the stack.
Exactly that would be found easily, if I provide
a modified array that checks for exactly
that. I can't find how debugging is unpythonic.

> ..from Paolo's reply:
> 
> 
>>In other words... is not the *W_IntObject with r_int* one of the *possible*
>>choice that pypy can choose for representing (to target code emission!) a
>>plain python int (c)?
>>Instead of r_int, for example, I can choose to use a tagged-rapresentation
>>of an int, and write a t_int that check the overflow and so problems (and
>>where to check arithmetics?)...(d).

This is a misunderstanding.
I wasn't about using integer objects at all, the r_int
emulated what will be a machine integer, later.
This isn't about tagging. I don't thing to use integer
/objects/ for implementing lists, for instance. I surely
mean all integer objects to be turned into machine words.
Again, this is probably since I thought that in /std/ we
do what's needed for generatong C code.

> This is right.  Consider for example the case of *two* different
> implementations (that would both appear as having the 'int' type to users).  
> Say that one is like Christian's W_IntObject+r_int, and the other one can only
> encode small, tagged integers.  The choice to use one or both representations
> in an actual C implementation must be made by the RPython-to-C translator, and
> not in the object space.  For example, if we want to produce something very
> similar to the current CPython, then we have no use for small, tagged
> integers.  The question is thus, "how do we express things to allow for this?"

I had the impression that the "std" objects space is meant to be
the one to be translated into C. Therefore I was trying to
write directly translatable code. But I see this goal disappearing...

What I wanted to do is a 'real' implementation of
basic types, whith 'real' algorithms, taken from
the C sources, with some simplifications and
clarifications by using Python, but not by using
Python objects.

If this is not the place to put this, then I'd
like to ask for the place for that.

> Similarily, we may provide different implementations for lists, dictionaries,
> whatever; we may even consider that Python's "long" type is an unneeded hack,
> for long integers could be just another implementation for the 'int' type,
> which goes very much in the direction that Python seems to go with the recent
> automatic conversions of overflows to longs.

I agree. So are the implementations like W_IntObject
implementations, or just interface definitions?

...

> In an effort to go in that direction I'd like to add that nothing has been 
> done yet about:
> 
>  * built-in methods (like list.append); the StdObjSpace.xxx.register() trick
>    only works for built-in operators
>  * non-built-in operators and methods, e.g.
>    implementing something like long_long_add in application-space
>    (longobject_app.py).

Yes. This doesn't happen, since we have no coherent design
how this should be done.

- chris

-- 
Christian Tismer             :^)   <mailto:tismer at tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/



More information about the Pypy-dev mailing list