My extension code generator for C++

Sat Jul 3 13:22:39 EDT 2010

It's still in the rough, but I wanted to give an update on my C++ 
extension generator. It's available at http://github.com/Rouslan/PyExpose

The documentation is a little slim right now but there is a 
comprehensive set of examples in test/test_kompile.py (replace the k 
with a c. For some reason, if I post this message with the correct name, 
it doesn't show up). The program takes an input file like

  <?xml version="1.0"?>
  <module name="modulename" include="vector">
      <doc>module doc string</doc>

      <class name="DVector" type="std::vector<double>">
          <doc>class doc string</doc>
          <init overload=""/>
          <init overload="size_t,const double&"/>
          <property name="size" get="size" set="resize"/>
          <def func="push_back"/>
          <def name="__sequence__getitem__" func="at" 
return-semantic="copy"/>
          <def name="__sequence__setitem__" assign-to="at"/>
      </class>
  </module>

and generates the code for a Python extension.

The goal has been to generate code with zero overhead. In other words I 
wanted to eliminate the tedium of creating an extension without 
sacrificing anything. In addition to generating a code file, the 
previous input would result in a header file with the following:

extern PyTypeObject obj_DVectorType;
inline PyTypeObject *get_obj_DVectorType() { return &obj_DVectorType; }
struct obj_DVector {
     PyObject_HEAD
     storage_mode mode;
     std::vector<double,std::allocator<double> > base;

     PY_MEM_NEW_DELETE
     obj_DVector() : base() {

     PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
         mode = CONTAINS;
     }
     obj_DVector(std::allocator<double> const & _0) : base(_0) {

     PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
         mode = CONTAINS;
     }
     obj_DVector(long unsigned int _0,double const & 
_1,std::allocator<double> const & _2) : base(_0,_1,_2) {

     PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
         mode = CONTAINS;
     }
     obj_DVector(std::vector<double,std::allocator<double> > const & _0) 
: base(_0) {

     PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
         mode = CONTAINS;
     }
};

so the object can be allocated in your own code as a single block of 
memory rather than having a PyObject contain a pointer to the exposed type.

storage_type is an enumeration, adding very little to the size of the 
Python object (or maybe nothing depending on alignment), but if you add 
new-initializes="true" to the <class> tag and the exposed type never 
needs to be held by a pointer/reference (as is the case when the exposed 
type is inside another class/struct), even that variable gets omitted.

The code also never uses PyArg_ParseTuple or its variants. It converts 
every argument using the appropriate PyX_FromY functions. I noticed 
PyBindGen does the following when a conversion is needed for one argument:

py_retval = Py_BuildValue((char *) "(O)", value);
if (!PyArg_ParseTuple(py_retval, (char *) "i", &self->obj->y)) {
     Py_DECREF(py_retval);
     return -1;
}
Py_DECREF(py_retval);

On the other hand, here's the implementation for __sequence__getitem__:

PyObject * obj_DVector___sequence__getitem__(obj_DVector 
*self,Py_ssize_t index) {
     try {
         std::vector<double,std::allocator<double> > &base = 
cast_base_DVector(reinterpret_cast<PyObject*>(self));
         return PyFloat_FromDouble(base.at(py_ssize_t_to_ulong(index)));

     } EXCEPT_HANDLERS(0)
}

(cast_base_DVector checks that base is initialized and gets a reference 
to it with regard to how it's stored in obj_DVector. If the class is 
new-initialized and only needs one means of storage, it's code will just 
be "return obj_DVector->base;" and should be inlined by an optimizing 
compiler.)

I'm really interested in what people think of this little project.