
Hello all. While trying to mix up some more stuff for __builtin__, I came up with an interesting problem. The solution I found for classmethod (using google on c.l.python, since I couldn't figure it out myself) requires that classes that use it must derive from object. That made me wonder, could we just automatically derive everything from object in Minimal? The big problem with doing this in CPython, as I understand it, is extensions that need to be converted, but we don't have that problem here, obviously. So would it be alright if I went ahead and used this code as if everything derived from object? -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Scott Fenton wrote:
No, this is not true. classmethod works for every kind of class, may it be a newstyle class, triggered by - deriving from object - using __slots__ - deriving from a builtin type or a descendant - did I forget something? or a "classic" class, it always works. For reference, see http://www.python.org/2.2/descrintro.html
No, sorry. object is just a special case for deriving a new-style class, see above. They can also be created by deriving from builtin types, or by using __slots__. Furthermore, I'm going to propose an extension to the new class system (at least for the MiniPy prototype) that goes a bit further: - __slots__ should get the ability to denote the type of a slot to be generated, especially ctypes types - it should be possible to derive a class from nothing, not even from object or classic, but I'd like to describe a plain C structure by classes. The latter will allow to define the object class and all builtin types with the same machinery. But please go ahead with your __builtins__ as it seems to fit. We can fix such stuff later. If you want to be perfect, try to define it for any class. cheers & thanks! -- chris

At 01:43 2003-01-24 +0100, Christian Tismer wrote:
I have some thoughts for decribing C structures (and anything else digital) in a canonical abstract way. I'll call such a decription a meta-representation, and a function that produces it from a Python object is meta_repr(python_object) => meta_rep_obj. The reverse operation is meta_eval ;-) The basis for meta_repr is that it capture and meta-represent (type, id, value) for an object in a pure abstract form. The abstract basis for storing the information is a bitvector object with attribute names that specify slices of the bitvector. I.e., there is an effective dict like {'slicename':(lo,hi)} and __getattr__ uses it so that bvec.slicename refers to bvec[lo:hi] etc. This is pure and abstract data, but you can see that interpreted little-endianly you can very straightforwardly lay out a C struct with whatever field widths and alignments you want. I also want to make a way to create subclasses of the base bitvector class that can have specified size and named slices as class variable info shared by instances. This is very analogous to a C struct definition, and BTW also permits union by just defining overlapping named slices. BTW2, there is also a natural way to iterate based on a slice name -- i.e., just walk successive contiguous same-size slices to some limit, starting with the one defined by a name (which doesn't have to start at bit 0 of the whole, of course, so you can get embedded arrays of bit fields starting anywhere). Hence a giant bitvector with a slice defined by byte=(16,24) could iterate in 8-bit bytes starting at bit 16. meta_eval-ing such a byte sequence meta-representation together with type and id would produce a Python character or string. If Psyco could deal with these meta_representations in a fine-tuned way, it might be possible to have very abstract representations of things and yet generate efficient bit-twiddling assembler level stuff, and, as you said, one consistent mechanism. One nice thing would be to be able to define machine instructions in terms of a few classes for different basic formats, with particular bit fields named right out of the book. Same for floating point doubles, etc. Packing code would be appending bit vectors, etc. Fortunately intel is little-endian, so there is a simple mapping to the most common machine stuff, but even if that were not so, the correspondence between an abstract bit list and an abstract integer represented as an ordered set of binary coefficients for powers of 2 just naturally sums little- endianly by sum=0; for i in range(len(bitvec)): sum += bitvec[i]*2**i Though of course that's not the most efficient way to compute it. Anyway, this meta-repr/meta_eval idea is only half baked, but in case you wanted to consider it, I wanted to offer it before you are all finished ;-) There is a bitvec.py in an old demo directory that I think I will cannibalize for a good deal of bit vector stuff, though I think I would want to base the data in a final version on arrays of ints that would map more naturally to arrays of machine ints instead of using Python longs. But for prototyping it doesn't matter. I still have to mull the class factory or perhaps metaclass stuff for creating classes whose instances will share bit slice name definitions. Also the best way to capture nested composition and naming or whole and sub-parts, when it's done by composition instead of one flat definition. BTW, it seems pickling and marshalling are also quite related, but I haven't thought about that. Interestingly, meta_repr produces a python object, so what does, e.g., meta_repr(meta_repr(123)) produce? It has to make sense ;-) I am mulling the encoding of type and id along with value into some composite meta_repr representation to represent a full object... Also the meaning of mixed structures, like an ordinary Python list of objects produced by meta_repr. It should be legal, but you have to keep the tin foil in good repair ;-) Please use what might further the project, and ignore the rest. But besides the possible use PyPython, would named-slice abstract bit vectors have a chance as a PEP as a way to express fine grain abstract digital info? Best, Bengt

Bengt Richter <bokr@oz.net> writes:
Sounds vaguely like this paper: First-Class Data-type Representation in SchemeXerox http://citeseer.nj.nec.com/4990.html Darius

Bengt Richter <bokr@oz.net> writes:
This all sounds very similar to what is already implemented in ctypes ;-) This is a the structure which stores to information for one field of a C structure or union: typedef struct { PyObject_HEAD int offset; int size; int index; /* Index into CDataObject's object array */ PyObject *proto; /* a type or NULL */ GETFUNC getfunc; /* getter function if proto is NULL */ SETFUNC setfunc; /* setter function if proto is NULL */ } CFieldObject; 'offset' is what you call 'lo', and 'offset + size' is your 'hi' attribute. 'proto' (I should probably have chosen a better name) is a Python object holding information about the field type (such as alignment requirements and storage size), and 'getfunc' and 'setfunc' are pointers to functions which are able to convert the data from Python to C and vice versa. Instances of these CFieldObjects populate the class dict of ctypes Structure and Union subclasses, they are created by their respective metaclass from the _fields_ attribute, and are used as attribute descriptors to expose the fields to Python. Thomas

At 09:01 2003-01-24 +0100, Thomas Heller wrote:
It does sound like a lot of similar ground is covered. Are your offset and size values in bits or bytes? (I intended bits).
The thing I think would be cool is if one could write python to build meta_repr objects in python and have Psyco compile Python code manipulating those representations and wind up with machine code effectively equivalent to what the C code in your ctypes module does when given the same implicit abstract info defining fields and accessors etc., and code using those. The thing about the latter situation is that Psyco would still see code accessing a foreign interface to getfunc/setfunc, whereas if it sees Python code actually accessing the data meta-representations behind getfunc/setfunc, it has a chance to bypass function calls and generate inline machine code instead of using your canned functions. Probably a loss at first, but eventually it could be a gain, depending on Psyco? Regards, Bengt

Bengt Richter <bokr@oz.net> writes: [description of ctypes internal deleted]
It does sound like a lot of similar ground is covered. Are your offset and size values in bits or bytes? (I intended bits).
Currently they measure in bytes, but only because I didn't have a need for bit fields in structs or unions.
Maybe. We'll see ;-) All great ideas in the air! Thomas

Christian Tismer <tismer@tismer.com> writes:
That's incorrect, IIUC. __slots__ only has a special meaning for *new stype classes* only, it doesn't trigger anything in classes classes. New style classes always have object as *one* of it's base classes, and most builtin types are new style classes also.
I'm not really understanding what you're proposing here. You could look at ctypes as implementing 'typed slots' with C-compatible layout. class A(object): __slots__ ["x", "y", "z"] class B(ctypes.Structure): _fields_ = [("x", "c"), ("y", "i"), ("z", "q")] __slots__ = [] Instances of both A and B can only have 'x', 'y', and 'z' instance variables (or should I say slots), both don't have a __dict__.
This is maybe also something that ctypes already does. The B *class* above knows all about this C structure struct B { char x; int y; long long z; };
ctypes.sizeof(B) 16
The latter will allow to define the object class and all builtin types with the same machinery.
Thomas

Thomas Heller wrote:
Christian Tismer <tismer@tismer.com> writes: ...
Yes, I forgot about that, you are right.
I am trying to extend the new-style classes that includes ctypes, somehow.
You could look at ctypes as implementing 'typed slots' with C-compatible layout.
How is that so different from my idea? ...
However, we have the same intent. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Christian Tismer wrote: ... A small addition:
Ok, what I was thinking of was to use ctypes or something similar to describe structs, and then to build all objects on top of this. This means that details like type pointer and reference counts go into this definiton as well, together with their behavior, and we are able to try different approaches as well. Probably this idea is trivial, and you though this way all the time. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

From: "Christian Tismer" <tismer@tismer.com>
maybe I'm stating the obvious, I think all of this is useful and necessary to get a running/working Python in Python and for targetting C-like/machine-code-level backends. OTOH I think a higher level of abstraction is necessary to targert more general backends. E.g. at such level what is relevant - about an integer object is its value and that's its type is integer - semantics of non-control-flow and binding byte codes are important e.g. : def binary_add(o1,o2): ... but the fact that there's a bytecode eval loop is much less so. (From my experience) a relevant issue is how to abstract over the PyTypeObject struct and the interpreter internal inheritance and lookup mechanisms (which do not correspond so directly to the language level semantics). regards.

Hello, On Fri, Jan 24, 2003 at 05:48:42PM +0100, Samuele Pedroni wrote:
OTOH I think a higher level of abstraction is necessary to targert more general backends.
I agree with Samuele that we should not focus on ctypes or any other kind of structs right now. For all of ctypes' power I believe that it is not central to Python-in-Python. This will become important later, when we target C.
That's a point I would like to see discussed too. CPython has grown a quite complex set of routines to dispatch calls corresponding to the language operators. We could closely follow these algorithms, e.g. by translating PyNumber_Add() into a "number_add()" function testing the presence of the "nb_add" method on the arguments. This will be as messy as in CPython, but automatically gives exactly the same semantics. On the other hand, this dispatchers are heavily loaded with historical stuff and workarounds and could probably be better summarized in a higher-level implementation, taking into account the general direction that Python seems to evolve to. We could probably design something that still offers compatibility. For example, we might find out a general rule of multiple-dispatching that corresponds to what CPython conceptually does. In other words we could figure out a declarative way of saying which "nb_add" methods must be tried and in which order. Something that would find the set of all appliable "nb_add" methods, order them, and try them in that order until one of them succeeds. This is complicated by internal coercions (the nb_coerce method), which tend to disappear from Python. We will have to choose whether we keep it in the core, at the possible expense of conceptuality, or if we completely drop them from the core classes (and possibly re-implement them later, e.g. using wrapper classes around the so-called old-style-numbers classes). A bientôt, Armin.

So you're suggesting backends besides C? That's a good idea since it would allow us to in fact build a backend in C# which I brought up as a joke earlier (no one seemed to have noticed the j/k) but which might not be such a bad idea. A java backend would even be possible. If most of the code is Python anyways and a minimal amount is needed in another language to get things running then there is huge potential to target other platforms, etc. It would even aid PPC/OS X support which is something I greatly desire. -- Nathan Heagy phone:306.653.4747 fax:306.653.4774 http://www.zu.com

[Armin Rigo Mon, Jan 27, 2003 at 01:18:00PM +0100]
how do you intend to use any of the existing C-libraries, then? Rely on CPython to provide the binding? I think that progressing in the ctypes direction can happen in parallel with Python-Core pythonifications. Beeing able to make C-library calls (like File-IO) without specialized intermediate C-code does seem like an important feature. holger

----- Original Message ----- From: "holger krekel" <hpk@trillke.net> To: "Armin Rigo" <arigo@tunes.org>; <pypy-dev@codespeak.net> Sent: Monday, January 27, 2003 6:00 PM Subject: Re: [pypy-dev] Builtin types
The point was whether you want your builtin types "abstractions" to be directly ctypes based. Or reformulated as a question: Is a goal to target some reasonable other virtual machines /languages/object models as execution substrate? (or do you want to limit yourself to implement some C/native code reedition/evolution of CPython) No is obviously a fine anwer. OTOH I think it is important to answer better sooner than later, because if one thinks that simply because this is (some) Python in Python the problem is automatically solved, he is having the wrong intuition. regards.

At 20:03 2003-01-27 +0100, Samuele Pedroni wrote: the PyPy situation is more complex than that, and instead of lines, a foam of nesting bubble boundaries may be needed ;-)
ISTM there is also the possibility of source-source down-translation to a subset of the same language, as an end as well as a bootstrapping mechanism. E.g., a common subset of Python 1.5.2 and 2.2.2, viewing a Python 1.5.2 as a virtual machine operating on source byte streams, analogous to ceval.c operating on byte codes. All of the above are really abstract views of what a CPU is doing at any given moment in some context, since a CPU is always doing the work with raw bits in registers and on-chip cache, getting and disposing of bits by electrically signaling other devices (mostly memory chips). I am trying to get my head around this in terms of mega and nano views of CLLG -- i.e., compile-link-load-go and associated resource management and dispatching of control. Somehow I suspect there has to be abstractions for these things expressible in the language in a more fundamental way, that reflects cleaner and more comprehensive abstractions than a single-boundary interface of OS api and/or ctypes calls. But I'm not there yet, and not advising anyone to hold their breath. Just offering preliminary thoughts for the conceptual pot-luck ;-)
Best regards, Bengt

Bengt Richter wrote:
The reason why I thought we would need something like ctypes is this: Plain Python has no way to describe physical memory layouts and primitive types by nature. There is the struct module with its limitations, but this is insufficient. Plain Python also does not have a way to describe restricted types at all, since it has no type declarations. The minor point was to be able to re-build existing C structures. This may become interesting when we really try to build compatibility. More urgent to me is to be able to describe integer cells of fixed width and other primitive types. They have the known semantics of primitive C types. If we use Python integers all the time to describe the C implementation of builtin types, we end up with lots of hairy tricks to describe how the do not overflow but wrap around, how unsigned integers are right-shifted without sign extension, and all of that. The idea is to bind that semantics to ctypes instances. Rethinking this initial idea, I admit that it is equally possible to do that with custom classes, which can be defined to have these semantics. I believe that we need these primitive types, or the re-implementation of Python innards will differ much more from the original than we intended. There alre already enough differences due to the different nature of the Python language. In order to keep as much of the existing code for an initial bootstrap, I don't believe it is good to have to re-think every and all internal modules in terms of different data types. Instead, I think it is easier to just focus on the changed language layout, lack of certain constructs and different loop layouts, but leaving most of the data type behavior as it is. A small example: For some benchmarking tests, I once re-implemented Python's MD5 module in Python, the best way I could. It ended up as a source, very similar to the original, and only slightly *longer*! This is due to the fact that the algorithm all the time made use of unsigned integers and their shifting properties. For my implementation, that became quite a nightmare of castings to long integer, together with masking with &ffffffff in order to keep the longs short. This is quite nasty, almost totally prevended optimization by Psyco, and was disappointing. The alternative to re-write the whole program to only use integer operations would have lead to even much more lines of code, and to a whole set of new complications, since every statement would have to be tested for the signs of the arguments. For the curious, I'd be happy to post this code for studies, and I'd like to encourage everybody who doesn't believe me to try to implement MD5 without using a single long integer. Conclusion: My wish to use ctypes or some similar abstraction for primitive types comes from the observation that it is not always trivial to model primitive types with Python's, and I think trying this is counter-productive, since we finally will *have* to use primitive types to get to a useful implementation. cheers - chris

Hello Holger, On Mon, Jan 27, 2003 at 08:03:40PM +0100, Samuele Pedroni wrote:
Yes, sorry. I was thinking about that, i.e. how we internally represent the built-in types. Being able to call external C code is another matter.
Yes, although I have another potential objection here. It might not be a problem to have specialized intermediate C code if this code is generated, just like -- after all it's the goal -- most of the rest of the final C code of the interpreter. Then what we need is a way to *describe* calls to a C function rather than a way to actually *do* the calls. So ctypes is a good way to express such a description, but it is not necessary to rely on libffi-style machine-code hackery to actually perform the calls; all we need to do is statically emit the specialized C code into the produced interpreter and obtain something close to the original CPython source. Of course I'm not the best person to talk about not liking machine-code hackery :-) This would certainly be a great thing to have. It could make the core interpreter dynamically extensible, lower its memory footprint, and tons of other benefits. I was just pointing out that the original Python interpreter that we intend to write in Python should not use ctypes directly, but only higher-level abstractions -- ones that could in some cases be automatically translated to ctypes calls.
Is a goal to target some reasonable other virtual machines /languages/object models as execution substrate?
Yes. Armin

[Armin Rigo Mon, Jan 27, 2003 at 03:02:49PM -0800]
ok.
I am all for doing as much as possible as runtime. Beeing able to get a python c-library binding dynamically (without relying on a C-interpreter) makes it usable on platforms where you don't have the right C-compiler ready - besides just beeing a cool feature. Generating some C source for the interpreter itself still makes sense, though. But i'd like any code generation to remain simple - including the generator code itself. Maybe it makes sense to compile to a 'nucleus' VM which only has very few byte codes and whose implementation can be generated. IMO the complexity of (and dependency on) C source generators could be reduced this way. greetings, holger

Hello Holger, On Tue, Jan 28, 2003 at 01:22:55AM +0100, holger krekel wrote:
(...) IMO the complexity of (and dependency on) C source generators could be reduced this way.
Ok. But we must keep all doors open by expressing things abstractedly, like defining classes for C function description. By default, in the "no-op down-translation" obtained by running the Python-in-Python code over CPython, the actual calls are implemented with ctypes. Everything that forces a particular down-translation is bad, even if that particular down-translation seems good. I can think of other cases where we will need a description of C function signatures but not the code that actually call them, e.g. Psyco (for which it will be useful to have some other information as well, like "does the C function have side-effects"). A bientôt, Armin.

[Armin Rigo Tue, Jan 28, 2003 at 03:05:52PM +0100]
Maybe not even that for starters.
Everything that forces a particular down-translation is bad, even if that particular down-translation seems good.
I am not sure i understand what you mean here. What i am aiming at is something like the following set of restriction for implementing the pypy-python-interpreter: - no nested scopes - simple function calls preferably with no arguments - no list comprehension - no generators - no += *= and friends - global namespace contains only immutable objects - very explicit names: always do e.g. 'self.valuestack.pop()' instead of 'self.pop()' Of course the pypy-interpreter needs to provide all python features to its higher-level python code. But if we follow the above restrictions (and maybe some more) then we might - for example - easily inline the 'bytecode interpretation functions' by transforming instance attribute lookups ('self.valuestack.pop') to LOAD_FAST/STORE_FAST style lookups. IMO It's very comfortable to have a version which is verified to run on CPython (and Jython while we are at it) but can be used for the next (generational) steps.
Sure, although the descriptions might not be accurate if they are not actually used. I explicititely don't think that the pypy-interpreter should require ctypes. So I don't think we are contradicting each other, are we? greetings, holger

Hello Holger, On Tue, Jan 28, 2003 at 03:33:09PM +0100, holger krekel wrote:
It seems we agree with each other. Sorry if I confused you. I was saying that the Python-in-Python interpreter itself should only rely on some custom descriptions for the external C functions. By "down-translation" I mean the same as your "next (generational) step", i.e. the statical analysis of the Python-in-Python source to produce lower-level code (e.g. C). Various down-translations will do various things from these C function descriptions.
Yes, I was pointing out that the role of ctypes is particular in (only) this respect: it will be probably be needed to run this verification --- unless all calls are also available from built-in modules provided by CPython.
What i am aiming at is something like the following set of restriction for implementing the pypy-python-interpreter:
I generally agree with you, although I would like to keep high-level Python structures available. I think the exact list will depend on what we feel to be necessary in a nice implementation, balanced against the expected complexity of the static analysis. In general I'd tend to favor a nice implementation.
- no nested scopes
We may even try to avoid nested functions altogether, and define more methods in our classes (unless it becomes confusing).
- simple function calls preferably with no arguments
Why not? Arguments are an essential abstraction which allow for much more optimizations than side-effect-based operations like storing values into instance attributes.
- no list comprehension
I've nothing against them. They are surely more conceptual than the corresponding "for" loop. We could reserve their use for particular cases, like when the expression to compute each item has no side-effects (so we would say "[x+2 for x in y]" but not "[f(x) for x in y]" if f() has side-effects). In other words we could use list comprehensions that would work even if the construction returned a generator instead of directly computing the list.
- no generators
Ok.
- no += *= and friends
I've nothing against them, but ok.
- global namespace contains only immutable objects
Yes.
- very explicit names: always do e.g. 'self.valuestack.pop()' instead of 'self.pop()'
Yup. Other restrictions I would suggest: - don't rely on simple operations throwing IndexError or OverflowError unless explicitely caught, e.g. all list item accesses should either be sure to fall within range, or be syntactically enclosed in a try:except: clause. - don't use the same variable, function argument or instance attribute to hold values of possibly multiple types. Use an explicit 'assert isinstance...' here and there. Eventually a straightforward global program analysis should be able to know which variable holds which type, for the majority of variables. - we can make exceptions to this rule, e.g. to allow either a value or None. In general I favor explicit special-cases: I much prefer a variable to contain None to mark a special value, than some value of the same type as the variable normally contains, like -1. Even better, when possible, throw an exception or perform some other altogether different action instead. A bientôt, Armin.

Hello, I was following this list, until it quite suddenly sent duplicates of several messages, then stopped getting mail entirely. Has this list been moved elsewhere? VanL

I guess I was just being alarmist... I was just used to 20+ messages a day. Dropping to 0 for a few days was unexpected. VanL

[VanL Thu, Jan 30, 2003 at 10:43:50AM -0700]
I guess I was just being alarmist... I was just used to 20+ messages a day. Dropping to 0 for a few days was unexpected.
I know this feeling. Just guessing but i think that the relative quietness is mainly because we are still floating a lot. We don't have a concise strategy and roadmap, yet. There is also no shared code, yet. Hopefully, I can concentrate on pypy-dev from next week on and setup some infrastructure and summary-kind of things. holger

holger krekel wrote:
That's it. Many ideas have been spread on this list, much has been said, but we don't have a summary. What we need instead of the list is now a Wiki, where we can begin to build a project plan, summaries of (also different) concepts, and first existing demo code snippets.
Hopefully, I can concentrate on pypy-dev from next week on and setup some infrastructure and summary-kind of things.
That's what we need now. ciao - chris

Armin Rigo <arigo@tunes.org> writes:
I would very much like to see aa easy to read and understand core which is free of this cruft (even if it is not 100% compatible with CPython).
(and possibly re-implement them later, e.g. using wrapper classes around the so-called old-style-numbers classes).
Even better if this would be possible. Maybe later we can remove the wrapper classes and call the result Python 3000 ;-). No, only joking... Thomas

Scott Fenton wrote:
No, this is not true. classmethod works for every kind of class, may it be a newstyle class, triggered by - deriving from object - using __slots__ - deriving from a builtin type or a descendant - did I forget something? or a "classic" class, it always works. For reference, see http://www.python.org/2.2/descrintro.html
No, sorry. object is just a special case for deriving a new-style class, see above. They can also be created by deriving from builtin types, or by using __slots__. Furthermore, I'm going to propose an extension to the new class system (at least for the MiniPy prototype) that goes a bit further: - __slots__ should get the ability to denote the type of a slot to be generated, especially ctypes types - it should be possible to derive a class from nothing, not even from object or classic, but I'd like to describe a plain C structure by classes. The latter will allow to define the object class and all builtin types with the same machinery. But please go ahead with your __builtins__ as it seems to fit. We can fix such stuff later. If you want to be perfect, try to define it for any class. cheers & thanks! -- chris

At 01:43 2003-01-24 +0100, Christian Tismer wrote:
I have some thoughts for decribing C structures (and anything else digital) in a canonical abstract way. I'll call such a decription a meta-representation, and a function that produces it from a Python object is meta_repr(python_object) => meta_rep_obj. The reverse operation is meta_eval ;-) The basis for meta_repr is that it capture and meta-represent (type, id, value) for an object in a pure abstract form. The abstract basis for storing the information is a bitvector object with attribute names that specify slices of the bitvector. I.e., there is an effective dict like {'slicename':(lo,hi)} and __getattr__ uses it so that bvec.slicename refers to bvec[lo:hi] etc. This is pure and abstract data, but you can see that interpreted little-endianly you can very straightforwardly lay out a C struct with whatever field widths and alignments you want. I also want to make a way to create subclasses of the base bitvector class that can have specified size and named slices as class variable info shared by instances. This is very analogous to a C struct definition, and BTW also permits union by just defining overlapping named slices. BTW2, there is also a natural way to iterate based on a slice name -- i.e., just walk successive contiguous same-size slices to some limit, starting with the one defined by a name (which doesn't have to start at bit 0 of the whole, of course, so you can get embedded arrays of bit fields starting anywhere). Hence a giant bitvector with a slice defined by byte=(16,24) could iterate in 8-bit bytes starting at bit 16. meta_eval-ing such a byte sequence meta-representation together with type and id would produce a Python character or string. If Psyco could deal with these meta_representations in a fine-tuned way, it might be possible to have very abstract representations of things and yet generate efficient bit-twiddling assembler level stuff, and, as you said, one consistent mechanism. One nice thing would be to be able to define machine instructions in terms of a few classes for different basic formats, with particular bit fields named right out of the book. Same for floating point doubles, etc. Packing code would be appending bit vectors, etc. Fortunately intel is little-endian, so there is a simple mapping to the most common machine stuff, but even if that were not so, the correspondence between an abstract bit list and an abstract integer represented as an ordered set of binary coefficients for powers of 2 just naturally sums little- endianly by sum=0; for i in range(len(bitvec)): sum += bitvec[i]*2**i Though of course that's not the most efficient way to compute it. Anyway, this meta-repr/meta_eval idea is only half baked, but in case you wanted to consider it, I wanted to offer it before you are all finished ;-) There is a bitvec.py in an old demo directory that I think I will cannibalize for a good deal of bit vector stuff, though I think I would want to base the data in a final version on arrays of ints that would map more naturally to arrays of machine ints instead of using Python longs. But for prototyping it doesn't matter. I still have to mull the class factory or perhaps metaclass stuff for creating classes whose instances will share bit slice name definitions. Also the best way to capture nested composition and naming or whole and sub-parts, when it's done by composition instead of one flat definition. BTW, it seems pickling and marshalling are also quite related, but I haven't thought about that. Interestingly, meta_repr produces a python object, so what does, e.g., meta_repr(meta_repr(123)) produce? It has to make sense ;-) I am mulling the encoding of type and id along with value into some composite meta_repr representation to represent a full object... Also the meaning of mixed structures, like an ordinary Python list of objects produced by meta_repr. It should be legal, but you have to keep the tin foil in good repair ;-) Please use what might further the project, and ignore the rest. But besides the possible use PyPython, would named-slice abstract bit vectors have a chance as a PEP as a way to express fine grain abstract digital info? Best, Bengt

Bengt Richter <bokr@oz.net> writes:
Sounds vaguely like this paper: First-Class Data-type Representation in SchemeXerox http://citeseer.nj.nec.com/4990.html Darius

Bengt Richter <bokr@oz.net> writes:
This all sounds very similar to what is already implemented in ctypes ;-) This is a the structure which stores to information for one field of a C structure or union: typedef struct { PyObject_HEAD int offset; int size; int index; /* Index into CDataObject's object array */ PyObject *proto; /* a type or NULL */ GETFUNC getfunc; /* getter function if proto is NULL */ SETFUNC setfunc; /* setter function if proto is NULL */ } CFieldObject; 'offset' is what you call 'lo', and 'offset + size' is your 'hi' attribute. 'proto' (I should probably have chosen a better name) is a Python object holding information about the field type (such as alignment requirements and storage size), and 'getfunc' and 'setfunc' are pointers to functions which are able to convert the data from Python to C and vice versa. Instances of these CFieldObjects populate the class dict of ctypes Structure and Union subclasses, they are created by their respective metaclass from the _fields_ attribute, and are used as attribute descriptors to expose the fields to Python. Thomas

At 09:01 2003-01-24 +0100, Thomas Heller wrote:
It does sound like a lot of similar ground is covered. Are your offset and size values in bits or bytes? (I intended bits).
The thing I think would be cool is if one could write python to build meta_repr objects in python and have Psyco compile Python code manipulating those representations and wind up with machine code effectively equivalent to what the C code in your ctypes module does when given the same implicit abstract info defining fields and accessors etc., and code using those. The thing about the latter situation is that Psyco would still see code accessing a foreign interface to getfunc/setfunc, whereas if it sees Python code actually accessing the data meta-representations behind getfunc/setfunc, it has a chance to bypass function calls and generate inline machine code instead of using your canned functions. Probably a loss at first, but eventually it could be a gain, depending on Psyco? Regards, Bengt

Bengt Richter <bokr@oz.net> writes: [description of ctypes internal deleted]
It does sound like a lot of similar ground is covered. Are your offset and size values in bits or bytes? (I intended bits).
Currently they measure in bytes, but only because I didn't have a need for bit fields in structs or unions.
Maybe. We'll see ;-) All great ideas in the air! Thomas

Christian Tismer <tismer@tismer.com> writes:
That's incorrect, IIUC. __slots__ only has a special meaning for *new stype classes* only, it doesn't trigger anything in classes classes. New style classes always have object as *one* of it's base classes, and most builtin types are new style classes also.
I'm not really understanding what you're proposing here. You could look at ctypes as implementing 'typed slots' with C-compatible layout. class A(object): __slots__ ["x", "y", "z"] class B(ctypes.Structure): _fields_ = [("x", "c"), ("y", "i"), ("z", "q")] __slots__ = [] Instances of both A and B can only have 'x', 'y', and 'z' instance variables (or should I say slots), both don't have a __dict__.
This is maybe also something that ctypes already does. The B *class* above knows all about this C structure struct B { char x; int y; long long z; };
ctypes.sizeof(B) 16
The latter will allow to define the object class and all builtin types with the same machinery.
Thomas

Thomas Heller wrote:
Christian Tismer <tismer@tismer.com> writes: ...
Yes, I forgot about that, you are right.
I am trying to extend the new-style classes that includes ctypes, somehow.
You could look at ctypes as implementing 'typed slots' with C-compatible layout.
How is that so different from my idea? ...
However, we have the same intent. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Christian Tismer wrote: ... A small addition:
Ok, what I was thinking of was to use ctypes or something similar to describe structs, and then to build all objects on top of this. This means that details like type pointer and reference counts go into this definiton as well, together with their behavior, and we are able to try different approaches as well. Probably this idea is trivial, and you though this way all the time. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

From: "Christian Tismer" <tismer@tismer.com>
maybe I'm stating the obvious, I think all of this is useful and necessary to get a running/working Python in Python and for targetting C-like/machine-code-level backends. OTOH I think a higher level of abstraction is necessary to targert more general backends. E.g. at such level what is relevant - about an integer object is its value and that's its type is integer - semantics of non-control-flow and binding byte codes are important e.g. : def binary_add(o1,o2): ... but the fact that there's a bytecode eval loop is much less so. (From my experience) a relevant issue is how to abstract over the PyTypeObject struct and the interpreter internal inheritance and lookup mechanisms (which do not correspond so directly to the language level semantics). regards.

Hello, On Fri, Jan 24, 2003 at 05:48:42PM +0100, Samuele Pedroni wrote:
OTOH I think a higher level of abstraction is necessary to targert more general backends.
I agree with Samuele that we should not focus on ctypes or any other kind of structs right now. For all of ctypes' power I believe that it is not central to Python-in-Python. This will become important later, when we target C.
That's a point I would like to see discussed too. CPython has grown a quite complex set of routines to dispatch calls corresponding to the language operators. We could closely follow these algorithms, e.g. by translating PyNumber_Add() into a "number_add()" function testing the presence of the "nb_add" method on the arguments. This will be as messy as in CPython, but automatically gives exactly the same semantics. On the other hand, this dispatchers are heavily loaded with historical stuff and workarounds and could probably be better summarized in a higher-level implementation, taking into account the general direction that Python seems to evolve to. We could probably design something that still offers compatibility. For example, we might find out a general rule of multiple-dispatching that corresponds to what CPython conceptually does. In other words we could figure out a declarative way of saying which "nb_add" methods must be tried and in which order. Something that would find the set of all appliable "nb_add" methods, order them, and try them in that order until one of them succeeds. This is complicated by internal coercions (the nb_coerce method), which tend to disappear from Python. We will have to choose whether we keep it in the core, at the possible expense of conceptuality, or if we completely drop them from the core classes (and possibly re-implement them later, e.g. using wrapper classes around the so-called old-style-numbers classes). A bientôt, Armin.

So you're suggesting backends besides C? That's a good idea since it would allow us to in fact build a backend in C# which I brought up as a joke earlier (no one seemed to have noticed the j/k) but which might not be such a bad idea. A java backend would even be possible. If most of the code is Python anyways and a minimal amount is needed in another language to get things running then there is huge potential to target other platforms, etc. It would even aid PPC/OS X support which is something I greatly desire. -- Nathan Heagy phone:306.653.4747 fax:306.653.4774 http://www.zu.com

[Armin Rigo Mon, Jan 27, 2003 at 01:18:00PM +0100]
how do you intend to use any of the existing C-libraries, then? Rely on CPython to provide the binding? I think that progressing in the ctypes direction can happen in parallel with Python-Core pythonifications. Beeing able to make C-library calls (like File-IO) without specialized intermediate C-code does seem like an important feature. holger

----- Original Message ----- From: "holger krekel" <hpk@trillke.net> To: "Armin Rigo" <arigo@tunes.org>; <pypy-dev@codespeak.net> Sent: Monday, January 27, 2003 6:00 PM Subject: Re: [pypy-dev] Builtin types
The point was whether you want your builtin types "abstractions" to be directly ctypes based. Or reformulated as a question: Is a goal to target some reasonable other virtual machines /languages/object models as execution substrate? (or do you want to limit yourself to implement some C/native code reedition/evolution of CPython) No is obviously a fine anwer. OTOH I think it is important to answer better sooner than later, because if one thinks that simply because this is (some) Python in Python the problem is automatically solved, he is having the wrong intuition. regards.

At 20:03 2003-01-27 +0100, Samuele Pedroni wrote: the PyPy situation is more complex than that, and instead of lines, a foam of nesting bubble boundaries may be needed ;-)
ISTM there is also the possibility of source-source down-translation to a subset of the same language, as an end as well as a bootstrapping mechanism. E.g., a common subset of Python 1.5.2 and 2.2.2, viewing a Python 1.5.2 as a virtual machine operating on source byte streams, analogous to ceval.c operating on byte codes. All of the above are really abstract views of what a CPU is doing at any given moment in some context, since a CPU is always doing the work with raw bits in registers and on-chip cache, getting and disposing of bits by electrically signaling other devices (mostly memory chips). I am trying to get my head around this in terms of mega and nano views of CLLG -- i.e., compile-link-load-go and associated resource management and dispatching of control. Somehow I suspect there has to be abstractions for these things expressible in the language in a more fundamental way, that reflects cleaner and more comprehensive abstractions than a single-boundary interface of OS api and/or ctypes calls. But I'm not there yet, and not advising anyone to hold their breath. Just offering preliminary thoughts for the conceptual pot-luck ;-)
Best regards, Bengt

Bengt Richter wrote:
The reason why I thought we would need something like ctypes is this: Plain Python has no way to describe physical memory layouts and primitive types by nature. There is the struct module with its limitations, but this is insufficient. Plain Python also does not have a way to describe restricted types at all, since it has no type declarations. The minor point was to be able to re-build existing C structures. This may become interesting when we really try to build compatibility. More urgent to me is to be able to describe integer cells of fixed width and other primitive types. They have the known semantics of primitive C types. If we use Python integers all the time to describe the C implementation of builtin types, we end up with lots of hairy tricks to describe how the do not overflow but wrap around, how unsigned integers are right-shifted without sign extension, and all of that. The idea is to bind that semantics to ctypes instances. Rethinking this initial idea, I admit that it is equally possible to do that with custom classes, which can be defined to have these semantics. I believe that we need these primitive types, or the re-implementation of Python innards will differ much more from the original than we intended. There alre already enough differences due to the different nature of the Python language. In order to keep as much of the existing code for an initial bootstrap, I don't believe it is good to have to re-think every and all internal modules in terms of different data types. Instead, I think it is easier to just focus on the changed language layout, lack of certain constructs and different loop layouts, but leaving most of the data type behavior as it is. A small example: For some benchmarking tests, I once re-implemented Python's MD5 module in Python, the best way I could. It ended up as a source, very similar to the original, and only slightly *longer*! This is due to the fact that the algorithm all the time made use of unsigned integers and their shifting properties. For my implementation, that became quite a nightmare of castings to long integer, together with masking with &ffffffff in order to keep the longs short. This is quite nasty, almost totally prevended optimization by Psyco, and was disappointing. The alternative to re-write the whole program to only use integer operations would have lead to even much more lines of code, and to a whole set of new complications, since every statement would have to be tested for the signs of the arguments. For the curious, I'd be happy to post this code for studies, and I'd like to encourage everybody who doesn't believe me to try to implement MD5 without using a single long integer. Conclusion: My wish to use ctypes or some similar abstraction for primitive types comes from the observation that it is not always trivial to model primitive types with Python's, and I think trying this is counter-productive, since we finally will *have* to use primitive types to get to a useful implementation. cheers - chris

Hello Holger, On Mon, Jan 27, 2003 at 08:03:40PM +0100, Samuele Pedroni wrote:
Yes, sorry. I was thinking about that, i.e. how we internally represent the built-in types. Being able to call external C code is another matter.
Yes, although I have another potential objection here. It might not be a problem to have specialized intermediate C code if this code is generated, just like -- after all it's the goal -- most of the rest of the final C code of the interpreter. Then what we need is a way to *describe* calls to a C function rather than a way to actually *do* the calls. So ctypes is a good way to express such a description, but it is not necessary to rely on libffi-style machine-code hackery to actually perform the calls; all we need to do is statically emit the specialized C code into the produced interpreter and obtain something close to the original CPython source. Of course I'm not the best person to talk about not liking machine-code hackery :-) This would certainly be a great thing to have. It could make the core interpreter dynamically extensible, lower its memory footprint, and tons of other benefits. I was just pointing out that the original Python interpreter that we intend to write in Python should not use ctypes directly, but only higher-level abstractions -- ones that could in some cases be automatically translated to ctypes calls.
Is a goal to target some reasonable other virtual machines /languages/object models as execution substrate?
Yes. Armin

[Armin Rigo Mon, Jan 27, 2003 at 03:02:49PM -0800]
ok.
I am all for doing as much as possible as runtime. Beeing able to get a python c-library binding dynamically (without relying on a C-interpreter) makes it usable on platforms where you don't have the right C-compiler ready - besides just beeing a cool feature. Generating some C source for the interpreter itself still makes sense, though. But i'd like any code generation to remain simple - including the generator code itself. Maybe it makes sense to compile to a 'nucleus' VM which only has very few byte codes and whose implementation can be generated. IMO the complexity of (and dependency on) C source generators could be reduced this way. greetings, holger

Hello Holger, On Tue, Jan 28, 2003 at 01:22:55AM +0100, holger krekel wrote:
(...) IMO the complexity of (and dependency on) C source generators could be reduced this way.
Ok. But we must keep all doors open by expressing things abstractedly, like defining classes for C function description. By default, in the "no-op down-translation" obtained by running the Python-in-Python code over CPython, the actual calls are implemented with ctypes. Everything that forces a particular down-translation is bad, even if that particular down-translation seems good. I can think of other cases where we will need a description of C function signatures but not the code that actually call them, e.g. Psyco (for which it will be useful to have some other information as well, like "does the C function have side-effects"). A bientôt, Armin.

[Armin Rigo Tue, Jan 28, 2003 at 03:05:52PM +0100]
Maybe not even that for starters.
Everything that forces a particular down-translation is bad, even if that particular down-translation seems good.
I am not sure i understand what you mean here. What i am aiming at is something like the following set of restriction for implementing the pypy-python-interpreter: - no nested scopes - simple function calls preferably with no arguments - no list comprehension - no generators - no += *= and friends - global namespace contains only immutable objects - very explicit names: always do e.g. 'self.valuestack.pop()' instead of 'self.pop()' Of course the pypy-interpreter needs to provide all python features to its higher-level python code. But if we follow the above restrictions (and maybe some more) then we might - for example - easily inline the 'bytecode interpretation functions' by transforming instance attribute lookups ('self.valuestack.pop') to LOAD_FAST/STORE_FAST style lookups. IMO It's very comfortable to have a version which is verified to run on CPython (and Jython while we are at it) but can be used for the next (generational) steps.
Sure, although the descriptions might not be accurate if they are not actually used. I explicititely don't think that the pypy-interpreter should require ctypes. So I don't think we are contradicting each other, are we? greetings, holger

Hello Holger, On Tue, Jan 28, 2003 at 03:33:09PM +0100, holger krekel wrote:
It seems we agree with each other. Sorry if I confused you. I was saying that the Python-in-Python interpreter itself should only rely on some custom descriptions for the external C functions. By "down-translation" I mean the same as your "next (generational) step", i.e. the statical analysis of the Python-in-Python source to produce lower-level code (e.g. C). Various down-translations will do various things from these C function descriptions.
Yes, I was pointing out that the role of ctypes is particular in (only) this respect: it will be probably be needed to run this verification --- unless all calls are also available from built-in modules provided by CPython.
What i am aiming at is something like the following set of restriction for implementing the pypy-python-interpreter:
I generally agree with you, although I would like to keep high-level Python structures available. I think the exact list will depend on what we feel to be necessary in a nice implementation, balanced against the expected complexity of the static analysis. In general I'd tend to favor a nice implementation.
- no nested scopes
We may even try to avoid nested functions altogether, and define more methods in our classes (unless it becomes confusing).
- simple function calls preferably with no arguments
Why not? Arguments are an essential abstraction which allow for much more optimizations than side-effect-based operations like storing values into instance attributes.
- no list comprehension
I've nothing against them. They are surely more conceptual than the corresponding "for" loop. We could reserve their use for particular cases, like when the expression to compute each item has no side-effects (so we would say "[x+2 for x in y]" but not "[f(x) for x in y]" if f() has side-effects). In other words we could use list comprehensions that would work even if the construction returned a generator instead of directly computing the list.
- no generators
Ok.
- no += *= and friends
I've nothing against them, but ok.
- global namespace contains only immutable objects
Yes.
- very explicit names: always do e.g. 'self.valuestack.pop()' instead of 'self.pop()'
Yup. Other restrictions I would suggest: - don't rely on simple operations throwing IndexError or OverflowError unless explicitely caught, e.g. all list item accesses should either be sure to fall within range, or be syntactically enclosed in a try:except: clause. - don't use the same variable, function argument or instance attribute to hold values of possibly multiple types. Use an explicit 'assert isinstance...' here and there. Eventually a straightforward global program analysis should be able to know which variable holds which type, for the majority of variables. - we can make exceptions to this rule, e.g. to allow either a value or None. In general I favor explicit special-cases: I much prefer a variable to contain None to mark a special value, than some value of the same type as the variable normally contains, like -1. Even better, when possible, throw an exception or perform some other altogether different action instead. A bientôt, Armin.

Hello, I was following this list, until it quite suddenly sent duplicates of several messages, then stopped getting mail entirely. Has this list been moved elsewhere? VanL

I guess I was just being alarmist... I was just used to 20+ messages a day. Dropping to 0 for a few days was unexpected. VanL

[VanL Thu, Jan 30, 2003 at 10:43:50AM -0700]
I guess I was just being alarmist... I was just used to 20+ messages a day. Dropping to 0 for a few days was unexpected.
I know this feeling. Just guessing but i think that the relative quietness is mainly because we are still floating a lot. We don't have a concise strategy and roadmap, yet. There is also no shared code, yet. Hopefully, I can concentrate on pypy-dev from next week on and setup some infrastructure and summary-kind of things. holger

holger krekel wrote:
That's it. Many ideas have been spread on this list, much has been said, but we don't have a summary. What we need instead of the list is now a Wiki, where we can begin to build a project plan, summaries of (also different) concepts, and first existing demo code snippets.
Hopefully, I can concentrate on pypy-dev from next week on and setup some infrastructure and summary-kind of things.
That's what we need now. ciao - chris

Armin Rigo <arigo@tunes.org> writes:
I would very much like to see aa easy to read and understand core which is free of this cruft (even if it is not 100% compatible with CPython).
(and possibly re-implement them later, e.g. using wrapper classes around the so-called old-style-numbers classes).
Even better if this would be possible. Maybe later we can remove the wrapper classes and call the result Python 3000 ;-). No, only joking... Thomas
participants (10)
-
Armin Rigo
-
Bengt Richter
-
Christian Tismer
-
Darius Bacon
-
holger krekel
-
Nathan Heagy
-
Samuele Pedroni
-
Scott Fenton
-
Thomas Heller
-
VanL