[pypy-dev] genpyrex -> genc ?

Armin Rigo arigo at tunes.org
Wed Aug 18 10:38:30 CEST 2004


Hello Richard,

On Tue, Aug 17, 2004 at 11:32:32PM +0100, Richard Emslie wrote:
> Not much to add.  It is far easier to understand the C (CPython) code than 
> the Pyrex as much closer to basic blocks.

That was my impression too.

> BTW New graphdisplay stuff is v. cool.  Would be nice if we could get a 
> popup with the generated C code for that block when mouse lingers over it, 
> somehow.

Yes, this would be useful in particular when the generated C code starts to 
depend on annotations (i.e. be typed).  My plan to do that is to write type
suffix in the macro calls, like:

   OP_ADD_iii(v5, v6, v7)

where the "i" mean that the annotator has found the corresponding variable to
be an integer.  Then in genc.h we'd have several macros:

#define OP_ADD_iii(x,y,res)  res=x+y;
#define OP_ADD_ooo(x,y,res)  res=PyNumber_Add(x,y);

(Plus some error checking.)

> Just a silly thought off the top of my head - we could generate very basic 
> c++ code (pretty much the same as the very basic c code) but with 
> reference counted c++ pointers / wrappers and use of C scoping rules 
> instead of PyObjects, and using the c++ exceptions mechanism to do pypy 
> exceptions.

Could be interesting to try.  It might be not obvious to use the C++ scoping
rules to get the refcounting effect that we can get now (see example code
below), which is to FREE() precisely the variables that are active at the
current position -- Pyrex takes a very crude approach: initialize all
variables with NULL, and at the end Py_XDECREF() them all.  By constrast
genc.py only ever uses Py_DECREF() on variables that are known to be alive.  
Knowing which variables are alive is extremely simple in our flow model:
that's all input variables of the block, plus the "result" variables of the
operations above the one that failed.  Moreover when a block exits into
another block we can just transfer the references from the output variables to
the next block's input variables with no Py_XXCREF() at all -- this is the
part that is difficult to do with automatic scope-based refcounting in C++.

The exception part is interesting too, though we might also think along the
lines of setjmp/longjmp to do them in C at some point.  But it is probable
(and to hope that) on some platforms where setjmp is expensive, C++ compilers
can produce better code.  So C++ is worth a try, even if we only use a very
small number of non-C features.  The idea is the avoid to gradually
re-introduce the Pyrex problems if we target C++: convoluting the generated
source code to force it into some high-level concept of C++ when just plain C
is a reasonable alternative.

Musing aloud, a few targets that would probably be worth a try (or more):

* directly assembler code ((c) mwh),

* the brand new Mini C compiler written in Python, without the C front-end,
  targetting its intermediate representation,

* LLVM,

* Lisp again -- but not the current gencl.py based on the obfuscated
  genpyrex.py, but a minimal translator only a few pages long that just
  produces calls to Lisp macros (like genc.py currently does for C).
  After all Lisp macros are *really* powerful and we'll have no problem
  compiling complex type-dependent code with them.  But these macros should
  be defined in Lisp instead of gencl.py doing the dirty work.

* Parrot, a JVM class file, .NET CLR, Psyco's ivm (low-level interpreter),
  etc.


A bientôt,

Armin.



More information about the Pypy-dev mailing list