
Hello, Perhaps generating Pyrex code isn't such a great idea. The great idea was to use CPython as an intermediate target. But writing C extension modules in C instead of in Pyrex is quite possibly much cleaner, because: * very basic C code is sufficient; * removing yet another intermediate step might not be a bad idea for clarity; * reference counting is easy because we know exactly when variables get out of scope: when they reach the end of a basic block and are not sent to the next block; * we don't have to work around various Pyrex restrictions (or hack Pyrex to lift them); * most importantly, the whole mess with the "class Op" in genpyrex.py can probably be completely omitted, because we can use C macros and generate just a succession of OP_XXX(a,b,c) where XXX is the operation name and a,b,c are the arguments. Mapping the operations to actual C code is done once and for all in a C header file with macro definitions. A quick-test (borring from genpyrex.py) is checked in a branch in http://www.codespeak.net/svn/pypy/branch/pypy-genc/ producing (not yet compilable) C code that looks like the attached example. It looks much like assembler code. It's quite readable and easy to map to the original flow graph, too. Not sure what to do now. Does this look like a good idea? A bientôt, Armin.

Hi Armin, On Tue, 17 Aug 2004, Armin Rigo wrote:
Not much to add. It is far easier to understand the C (CPython) code than the Pyrex as much closer to basic blocks. Wasn't looking forward to having to dive into the Pyrex source code to try figure out what went wrong, when it does go wrong. :-) Anyways you said that already. +1 from me. BTW New graphdisplay stuff is v. cool. Would be nice if we could get a popup with the generated C code for that block when mouse lingers over it, somehow. Just a silly thought off the top of my head - we could generate very basic c++ code (pretty much the same as the very basic c code) but with reference counted c++ pointers / wrappers and use of C scoping rules instead of PyObjects, and using the c++ exceptions mechanism to do pypy exceptions. That probably needs some expanding / rationalization, but you get the idea (I might have a play at the weekend.) Cheers, Richard

Hello Richard, On Tue, Aug 17, 2004 at 11:32:32PM +0100, Richard Emslie wrote:
Not much to add. It is far easier to understand the C (CPython) code than the Pyrex as much closer to basic blocks.
That was my impression too.
Yes, this would be useful in particular when the generated C code starts to depend on annotations (i.e. be typed). My plan to do that is to write type suffix in the macro calls, like: OP_ADD_iii(v5, v6, v7) where the "i" mean that the annotator has found the corresponding variable to be an integer. Then in genc.h we'd have several macros: #define OP_ADD_iii(x,y,res) res=x+y; #define OP_ADD_ooo(x,y,res) res=PyNumber_Add(x,y); (Plus some error checking.)
Could be interesting to try. It might be not obvious to use the C++ scoping rules to get the refcounting effect that we can get now (see example code below), which is to FREE() precisely the variables that are active at the current position -- Pyrex takes a very crude approach: initialize all variables with NULL, and at the end Py_XDECREF() them all. By constrast genc.py only ever uses Py_DECREF() on variables that are known to be alive. Knowing which variables are alive is extremely simple in our flow model: that's all input variables of the block, plus the "result" variables of the operations above the one that failed. Moreover when a block exits into another block we can just transfer the references from the output variables to the next block's input variables with no Py_XXCREF() at all -- this is the part that is difficult to do with automatic scope-based refcounting in C++. The exception part is interesting too, though we might also think along the lines of setjmp/longjmp to do them in C at some point. But it is probable (and to hope that) on some platforms where setjmp is expensive, C++ compilers can produce better code. So C++ is worth a try, even if we only use a very small number of non-C features. The idea is the avoid to gradually re-introduce the Pyrex problems if we target C++: convoluting the generated source code to force it into some high-level concept of C++ when just plain C is a reasonable alternative. Musing aloud, a few targets that would probably be worth a try (or more): * directly assembler code ((c) mwh), * the brand new Mini C compiler written in Python, without the C front-end, targetting its intermediate representation, * LLVM, * Lisp again -- but not the current gencl.py based on the obfuscated genpyrex.py, but a minimal translator only a few pages long that just produces calls to Lisp macros (like genc.py currently does for C). After all Lisp macros are *really* powerful and we'll have no problem compiling complex type-dependent code with them. But these macros should be defined in Lisp instead of gencl.py doing the dirty work. * Parrot, a JVM class file, .NET CLR, Psyco's ivm (low-level interpreter), etc. A bientôt, Armin.

Hi Armin, [Armin Rigo Tue, Aug 17, 2004 at 08:21:45PM +0100]
Perhaps generating Pyrex code isn't such a great idea.
i'd dare to say it was a good idea at the time because it worked and allowed us to test/debug/hack things quickly.
The great idea was to use CPython as an intermediate target.
yes, that was at the core.
But writing C extension modules in C instead of in Pyrex is quite possibly much cleaner [...]
It should be much cleaner in the end, although it seems you are currently avoiding to generate e.g. C-int <-> PyInt style code, not caring about Exceptions. At least i liked Pyrex for helping with all this neccessary "fluff" as well as integration into CPython so that it was easy to test the newly generated code. This is of course all possible to do ourself and it's probably a good time to do it.
Yes, i hope it doesn't take too much time to actually generate C that doesn't result in Segfaults all the time :-)
It looks much like assembler code. It's quite readable and easy to map to the original flow graph, too.
Yes, that's nice.
Not sure what to do now. Does this look like a good idea?
try to get it to actually work and integrated into the tests? cheers, holger

Hello Holger, On Wed, Aug 18, 2004 at 01:18:39PM +0200, holger krekel wrote:
You are right on spot. My quick hack is simple because it doesn't try to do typing, exceptions, classes, etc. I've spent some time trying to add typing in it, with automatic conversions like C-int -> PyIntObject, and as you guess it's becoming a mess... Still trying to figure out if there is a clean way to put it in, e.g. in a separate phase that would come because code generation itself. A bientôt, Armin.

Hi Armin, On Tue, 17 Aug 2004, Armin Rigo wrote:
Not much to add. It is far easier to understand the C (CPython) code than the Pyrex as much closer to basic blocks. Wasn't looking forward to having to dive into the Pyrex source code to try figure out what went wrong, when it does go wrong. :-) Anyways you said that already. +1 from me. BTW New graphdisplay stuff is v. cool. Would be nice if we could get a popup with the generated C code for that block when mouse lingers over it, somehow. Just a silly thought off the top of my head - we could generate very basic c++ code (pretty much the same as the very basic c code) but with reference counted c++ pointers / wrappers and use of C scoping rules instead of PyObjects, and using the c++ exceptions mechanism to do pypy exceptions. That probably needs some expanding / rationalization, but you get the idea (I might have a play at the weekend.) Cheers, Richard

Hello Richard, On Tue, Aug 17, 2004 at 11:32:32PM +0100, Richard Emslie wrote:
Not much to add. It is far easier to understand the C (CPython) code than the Pyrex as much closer to basic blocks.
That was my impression too.
Yes, this would be useful in particular when the generated C code starts to depend on annotations (i.e. be typed). My plan to do that is to write type suffix in the macro calls, like: OP_ADD_iii(v5, v6, v7) where the "i" mean that the annotator has found the corresponding variable to be an integer. Then in genc.h we'd have several macros: #define OP_ADD_iii(x,y,res) res=x+y; #define OP_ADD_ooo(x,y,res) res=PyNumber_Add(x,y); (Plus some error checking.)
Could be interesting to try. It might be not obvious to use the C++ scoping rules to get the refcounting effect that we can get now (see example code below), which is to FREE() precisely the variables that are active at the current position -- Pyrex takes a very crude approach: initialize all variables with NULL, and at the end Py_XDECREF() them all. By constrast genc.py only ever uses Py_DECREF() on variables that are known to be alive. Knowing which variables are alive is extremely simple in our flow model: that's all input variables of the block, plus the "result" variables of the operations above the one that failed. Moreover when a block exits into another block we can just transfer the references from the output variables to the next block's input variables with no Py_XXCREF() at all -- this is the part that is difficult to do with automatic scope-based refcounting in C++. The exception part is interesting too, though we might also think along the lines of setjmp/longjmp to do them in C at some point. But it is probable (and to hope that) on some platforms where setjmp is expensive, C++ compilers can produce better code. So C++ is worth a try, even if we only use a very small number of non-C features. The idea is the avoid to gradually re-introduce the Pyrex problems if we target C++: convoluting the generated source code to force it into some high-level concept of C++ when just plain C is a reasonable alternative. Musing aloud, a few targets that would probably be worth a try (or more): * directly assembler code ((c) mwh), * the brand new Mini C compiler written in Python, without the C front-end, targetting its intermediate representation, * LLVM, * Lisp again -- but not the current gencl.py based on the obfuscated genpyrex.py, but a minimal translator only a few pages long that just produces calls to Lisp macros (like genc.py currently does for C). After all Lisp macros are *really* powerful and we'll have no problem compiling complex type-dependent code with them. But these macros should be defined in Lisp instead of gencl.py doing the dirty work. * Parrot, a JVM class file, .NET CLR, Psyco's ivm (low-level interpreter), etc. A bientôt, Armin.

Hi Armin, [Armin Rigo Tue, Aug 17, 2004 at 08:21:45PM +0100]
Perhaps generating Pyrex code isn't such a great idea.
i'd dare to say it was a good idea at the time because it worked and allowed us to test/debug/hack things quickly.
The great idea was to use CPython as an intermediate target.
yes, that was at the core.
But writing C extension modules in C instead of in Pyrex is quite possibly much cleaner [...]
It should be much cleaner in the end, although it seems you are currently avoiding to generate e.g. C-int <-> PyInt style code, not caring about Exceptions. At least i liked Pyrex for helping with all this neccessary "fluff" as well as integration into CPython so that it was easy to test the newly generated code. This is of course all possible to do ourself and it's probably a good time to do it.
Yes, i hope it doesn't take too much time to actually generate C that doesn't result in Segfaults all the time :-)
It looks much like assembler code. It's quite readable and easy to map to the original flow graph, too.
Yes, that's nice.
Not sure what to do now. Does this look like a good idea?
try to get it to actually work and integrated into the tests? cheers, holger

Hello Holger, On Wed, Aug 18, 2004 at 01:18:39PM +0200, holger krekel wrote:
You are right on spot. My quick hack is simple because it doesn't try to do typing, exceptions, classes, etc. I've spent some time trying to add typing in it, with automatic conversions like C-int -> PyIntObject, and as you guess it's becoming a mess... Still trying to figure out if there is a clean way to put it in, e.g. in a separate phase that would come because code generation itself. A bientôt, Armin.
participants (3)
-
Armin Rigo
-
holger krekel
-
Richard Emslie