[pypy-dev] Builtin types

Christian Tismer tismer at tismer.com
Tue Jan 28 00:18:37 CET 2003

Bengt Richter wrote:
> At 20:03 2003-01-27 +0100, Samuele Pedroni wrote:
>>----- Original Message -----
>>From: "holger krekel" <hpk at trillke.net>
>>To: "Armin Rigo" <arigo at tunes.org>; <pypy-dev at codespeak.net>
>>Sent: Monday, January 27, 2003 6:00 PM
>>Subject: Re: [pypy-dev] Builtin types
>>>[Armin Rigo Mon, Jan 27, 2003 at 01:18:00PM +0100]
>>>>On Fri, Jan 24, 2003 at 05:48:42PM +0100, Samuele Pedroni wrote:
>>>>>OTOH I think a higher level of abstraction is necessary to targert more
>>>>I agree with Samuele that we should not focus on ctypes or any other kind
>>>>structs right now.  For all of ctypes' power I believe that it is not
>>>>to Python-in-Python.  This will become important later, when we target C.
>>>how do you intend to use any of the existing C-libraries, then?
>>>Rely on CPython to provide the binding?
>>>I think that progressing in the ctypes direction can happen in
>>>parallel with Python-Core pythonifications.  Beeing able to make
>>>C-library calls (like File-IO) without specialized intermediate C-code
>>>does seem like an important feature.
>>The point was whether you want your builtin types "abstractions" to be directly
>>ctypes based.
> IMO (FWIW) no, but OTOH I think the functionality is needed. So in order to get the
> "abstractions" right, perhaps a thin wrapper around ctypes would be a practical
> near-term step. Then the question becomes what the "abstractions" involved in
> calling on ctypes really are, and what that thin wrapper should look like.
> It is easy to draw a line and say crossing it is an OS API call, but I am thinking
> the PyPy situation is more complex than that, and instead of lines, a foam of nesting
> bubble boundaries may be needed ;-)

The reason why I thought we would need something like ctypes
is this:
Plain Python has no way to describe physical memory layouts
and primitive types by nature. There is the struct module
with its limitations, but this is insufficient.
Plain Python also does not have a way to describe restricted
types at all, since it has no type declarations.
The minor point was to be able to re-build existing C structures.
This may become interesting when we really try to build
More urgent to me is to be able to describe integer cells of
fixed width and other primitive types. They have the known
semantics of primitive C types.
If we use Python integers all the time to describe the C implementation
of builtin types, we end up with lots of hairy tricks to describe
how the do not overflow but wrap around, how unsigned integers
are right-shifted without sign extension, and all of that.
The idea is to bind that semantics to ctypes instances.

Rethinking this initial idea, I admit that it is equally possible
to do that with custom classes, which can be defined to have
these semantics. I believe that we need these primitive types,
or the re-implementation of Python innards will differ much more
from the original than we intended. There alre already enough
differences due to the different nature of the Python language.
In order to keep as much of the existing code for an initial
bootstrap, I don't believe it is good to have to re-think
every and all internal modules in terms of different data types.
Instead, I think it is easier to just focus on the changed
language layout, lack of certain constructs and different loop
layouts, but leaving most of the data type behavior as it is.

A small example: For some benchmarking tests, I once re-implemented
Python's MD5 module in Python, the best way I could.
It ended up as a source, very similar to the original, and only
slightly *longer*! This is due to the fact that the algorithm all
the time made use of unsigned integers and their shifting properties.
For my implementation, that became quite a nightmare of castings
to long integer, together with masking with &ffffffff in order to
keep the longs short. This is quite nasty, almost totally prevended 
optimization by Psyco, and was disappointing.
The alternative to re-write the whole program to only use integer
operations would have lead to even much more lines of code, and
to a whole set of new complications, since every statement would have
to be tested for the signs of the arguments.

For the curious, I'd be happy to post this code for studies, and I'd
like to encourage everybody who doesn't believe me to try to
implement MD5 without using a single long integer.

Conclusion: My wish to use ctypes or some similar abstraction for
primitive types comes from the observation that it is not always
trivial to model primitive types with Python's, and I think
trying this is counter-productive, since we finally will *have*
to use primitive types to get to a useful implementation.

cheers - chris

More information about the Pypy-dev mailing list