
Bengt Richter wrote:
At 20:03 2003-01-27 +0100, Samuele Pedroni wrote:
----- Original Message ----- From: "holger krekel" <hpk@trillke.net> To: "Armin Rigo" <arigo@tunes.org>; <pypy-dev@codespeak.net> Sent: Monday, January 27, 2003 6:00 PM Subject: Re: [pypy-dev] Builtin types
[Armin Rigo Mon, Jan 27, 2003 at 01:18:00PM +0100]
Hello,
On Fri, Jan 24, 2003 at 05:48:42PM +0100, Samuele Pedroni wrote:
OTOH I think a higher level of abstraction is necessary to targert more
general
backends.
I agree with Samuele that we should not focus on ctypes or any other kind
of
structs right now. For all of ctypes' power I believe that it is not
central
to Python-in-Python. This will become important later, when we target C.
how do you intend to use any of the existing C-libraries, then? Rely on CPython to provide the binding?
I think that progressing in the ctypes direction can happen in parallel with Python-Core pythonifications. Beeing able to make C-library calls (like File-IO) without specialized intermediate C-code does seem like an important feature.
The point was whether you want your builtin types "abstractions" to be directly ctypes based.
IMO (FWIW) no, but OTOH I think the functionality is needed. So in order to get the "abstractions" right, perhaps a thin wrapper around ctypes would be a practical near-term step. Then the question becomes what the "abstractions" involved in calling on ctypes really are, and what that thin wrapper should look like. It is easy to draw a line and say crossing it is an OS API call, but I am thinking the PyPy situation is more complex than that, and instead of lines, a foam of nesting bubble boundaries may be needed ;-)
The reason why I thought we would need something like ctypes is this: Plain Python has no way to describe physical memory layouts and primitive types by nature. There is the struct module with its limitations, but this is insufficient. Plain Python also does not have a way to describe restricted types at all, since it has no type declarations. The minor point was to be able to re-build existing C structures. This may become interesting when we really try to build compatibility. More urgent to me is to be able to describe integer cells of fixed width and other primitive types. They have the known semantics of primitive C types. If we use Python integers all the time to describe the C implementation of builtin types, we end up with lots of hairy tricks to describe how the do not overflow but wrap around, how unsigned integers are right-shifted without sign extension, and all of that. The idea is to bind that semantics to ctypes instances. Rethinking this initial idea, I admit that it is equally possible to do that with custom classes, which can be defined to have these semantics. I believe that we need these primitive types, or the re-implementation of Python innards will differ much more from the original than we intended. There alre already enough differences due to the different nature of the Python language. In order to keep as much of the existing code for an initial bootstrap, I don't believe it is good to have to re-think every and all internal modules in terms of different data types. Instead, I think it is easier to just focus on the changed language layout, lack of certain constructs and different loop layouts, but leaving most of the data type behavior as it is. A small example: For some benchmarking tests, I once re-implemented Python's MD5 module in Python, the best way I could. It ended up as a source, very similar to the original, and only slightly *longer*! This is due to the fact that the algorithm all the time made use of unsigned integers and their shifting properties. For my implementation, that became quite a nightmare of castings to long integer, together with masking with &ffffffff in order to keep the longs short. This is quite nasty, almost totally prevended optimization by Psyco, and was disappointing. The alternative to re-write the whole program to only use integer operations would have lead to even much more lines of code, and to a whole set of new complications, since every statement would have to be tested for the signs of the arguments. For the curious, I'd be happy to post this code for studies, and I'd like to encourage everybody who doesn't believe me to try to implement MD5 without using a single long integer. Conclusion: My wish to use ctypes or some similar abstraction for primitive types comes from the observation that it is not always trivial to model primitive types with Python's, and I think trying this is counter-productive, since we finally will *have* to use primitive types to get to a useful implementation. cheers - chris