[pypy-dev] Mixed modules for both PyPy and CPython

April 12, 2006

      Hi all,

This is a follow-up on last night's IRC discussion.  It's come to a
point where e-mail is better.

The topic is "WP3", i.e. how to write in Python built-in modules that
work for both CPython and PyPy ("built-in" == "extension" ==
"compileable" for the purpose of this mail).  There are two levels of
issues involved in this:

 * In which style the modules should be written in the first place?

 * Which style is easier to support in both CPython and PyPy given
   what we have now?

In PyPy, we are using "mixed modules" with an interp-level and an
app-level part, both optional, with explicit ways to interact between
them.  The interp-level part is what gets compiled to C.  It contains
code like 'space.add(w_1, w_2)' to perform an app-level addition.

Christian is working on an alternative, which is a single-source
approach where the annotator's failures to derive types are taken as a
hint that the corresponding variables should contain wrapped objects.
This is a convenient but hard-to-control extension of a historical hack,
which works mostly because we needed it while we were still developing
the annotator and the rtyper.

I am clearly biased towards "solution 1", which is to reuse the mixed
modules style that we have evolved over several years.  Here is some
implementation effort comparison between the two styles (1=mixed module,
2=single source).

It is easy to develop and test in 2, as it runs on CPython with no
further efforts.  For 1, the module developer needs the whole of PyPy to
test his complete module.  We could ease this by writing a dummy
interface-compatible gateway and object space, performing sanity-checks
and giving useful error messages.  Then the developer only needs to
check his code against this.

Annotation problems in the module are easier to spot early in the model
1; indeed, the fact that with 2 we cannot gracefully crash on
SomeObjects, and moreover the need for many fragile-looking small
extensions to the flow object space and the annotator, is what makes me
most uneasy about 2.

The mixed modules are designed for PyPy, so they work there already now.
For the single-source approach to work on PyPy, however, there would be
many interesting and delicate problems -- both for py.py and for the
translated pypy-c.  I'd rather avoid to think about it too much right
now :-)

For translation, for 2 we already have the basic machinery implemented
as a leftover from PyPy "early ages".  But I want to convince you that
the basic support for 1 is extremely easy, or will soon be.  We need a
new object space to compile with the mixed module; but this
"CPyObjSpace" is completely trivial.  It could be based on rctypes, in
which case it would look like this:

    class CPyObjSpace:

        def newint(self, intval):
            return ctypes.pydll.PyInt_FromLong(intval)

        def add(self, w_1, w_2):
            return ctypes.pydll.PyNumber_Add(w_1, w_2)
        ...

Note that this even works on top of CPython!  (Not out-of-the-box on
Linux, however, where for obscure reasons CPython is by default not
compiled as a separate .so file...).  The gateway code can be written in
the same style, by using ctypes.pydll.PyObject_Call() to call the
app-level parts, and ctypes callbacks for the reverse direction.  The
calls like 'ctypes.pydll.PyInt_FromLong()' return an instance of
'ctypes.py_object', which naturally plays the role of a wrapped object
for the CPyObjSpace.

The more difficult bits of translation are e.g. support for creating at
interp-level types that are exposed to app-level, in particular about
the special methods __add__, __getitem__ etc.  There is an example of
that in pypy.module.Numeric.interp_numeric, where __getitem__ is added
to the TypeDef of "array".  This creates some difficulty that is common
to approaches 1 and 2, and which I think Christian is also working on.
In 1 we would probably make the TypeDef declaration turn, for CPython,
into static PyTypeObject structures.  The special methods can be
recognized and put into the proper slots, with a bit of wrapping.  The
whole PyTypeObject structure with its pointers to functions could be
expressed with ctypes too:

    # use pypy.rpython.rctypes.ctypes_platform to get at the layout
    PyTypeObject = ctypes_platform.getstruct("PyTypeObject *", ...)

    def TypeDef(name, **rawdict):
        mytype = PyTypeObject()
        mytype.tp_name = name
        if '__getitem__' in rawdict:
            # build a ctypes callback
            mp_subscript = callback_wrapper(rawdict['__dict__'])
            mytype.tp_as_mapping.mp_subscript = mp_subscript
        return mytype

This gives us prebuilt PyTypeObject structures in ctypes, which get
compiled into the C code by rctypes.  At the same time, running on top
of CPython is still possible; ctypes will let you issue the following
kind of calls dynamically when running on CPython:

    myinstance = ctypes.pydll.PyObject_Call(mytype)

The same line translated by rctypes becomes in the C file:

    PyObject* myinstance = PyObject_Call(mytype);

Of course this latter part is dependent on rctypes being completed
first.  I understand and respect Christian's needs for something that
works right now; nevertheless, at this point I think the mixed modules
approach is the best medium-term solution.  This is all open to
discussion, of course.  ...but do keep in mind that the people who
completed the annotator as it is now are not likely to appreciate the
addition of Yet More Ad-Hoc Heuristics all over the place :-/

A bientot,

Armin

[pypy-dev] Mixed modules for both PyPy and CPython

Armin Rigo