Hi all, This is a follow-up on last night's IRC discussion. It's come to a point where e-mail is better. The topic is "WP3", i.e. how to write in Python built-in modules that work for both CPython and PyPy ("built-in" == "extension" == "compileable" for the purpose of this mail). There are two levels of issues involved in this: * In which style the modules should be written in the first place? * Which style is easier to support in both CPython and PyPy given what we have now? In PyPy, we are using "mixed modules" with an interp-level and an app-level part, both optional, with explicit ways to interact between them. The interp-level part is what gets compiled to C. It contains code like 'space.add(w_1, w_2)' to perform an app-level addition. Christian is working on an alternative, which is a single-source approach where the annotator's failures to derive types are taken as a hint that the corresponding variables should contain wrapped objects. This is a convenient but hard-to-control extension of a historical hack, which works mostly because we needed it while we were still developing the annotator and the rtyper. I am clearly biased towards "solution 1", which is to reuse the mixed modules style that we have evolved over several years. Here is some implementation effort comparison between the two styles (1=mixed module, 2=single source). It is easy to develop and test in 2, as it runs on CPython with no further efforts. For 1, the module developer needs the whole of PyPy to test his complete module. We could ease this by writing a dummy interface-compatible gateway and object space, performing sanity-checks and giving useful error messages. Then the developer only needs to check his code against this. Annotation problems in the module are easier to spot early in the model 1; indeed, the fact that with 2 we cannot gracefully crash on SomeObjects, and moreover the need for many fragile-looking small extensions to the flow object space and the annotator, is what makes me most uneasy about 2. The mixed modules are designed for PyPy, so they work there already now. For the single-source approach to work on PyPy, however, there would be many interesting and delicate problems -- both for py.py and for the translated pypy-c. I'd rather avoid to think about it too much right now :-) For translation, for 2 we already have the basic machinery implemented as a leftover from PyPy "early ages". But I want to convince you that the basic support for 1 is extremely easy, or will soon be. We need a new object space to compile with the mixed module; but this "CPyObjSpace" is completely trivial. It could be based on rctypes, in which case it would look like this: class CPyObjSpace: def newint(self, intval): return ctypes.pydll.PyInt_FromLong(intval) def add(self, w_1, w_2): return ctypes.pydll.PyNumber_Add(w_1, w_2) ... Note that this even works on top of CPython! (Not out-of-the-box on Linux, however, where for obscure reasons CPython is by default not compiled as a separate .so file...). The gateway code can be written in the same style, by using ctypes.pydll.PyObject_Call() to call the app-level parts, and ctypes callbacks for the reverse direction. The calls like 'ctypes.pydll.PyInt_FromLong()' return an instance of 'ctypes.py_object', which naturally plays the role of a wrapped object for the CPyObjSpace. The more difficult bits of translation are e.g. support for creating at interp-level types that are exposed to app-level, in particular about the special methods __add__, __getitem__ etc. There is an example of that in pypy.module.Numeric.interp_numeric, where __getitem__ is added to the TypeDef of "array". This creates some difficulty that is common to approaches 1 and 2, and which I think Christian is also working on. In 1 we would probably make the TypeDef declaration turn, for CPython, into static PyTypeObject structures. The special methods can be recognized and put into the proper slots, with a bit of wrapping. The whole PyTypeObject structure with its pointers to functions could be expressed with ctypes too: # use pypy.rpython.rctypes.ctypes_platform to get at the layout PyTypeObject = ctypes_platform.getstruct("PyTypeObject *", ...) def TypeDef(name, **rawdict): mytype = PyTypeObject() mytype.tp_name = name if '__getitem__' in rawdict: # build a ctypes callback mp_subscript = callback_wrapper(rawdict['__dict__']) mytype.tp_as_mapping.mp_subscript = mp_subscript return mytype This gives us prebuilt PyTypeObject structures in ctypes, which get compiled into the C code by rctypes. At the same time, running on top of CPython is still possible; ctypes will let you issue the following kind of calls dynamically when running on CPython: myinstance = ctypes.pydll.PyObject_Call(mytype) The same line translated by rctypes becomes in the C file: PyObject* myinstance = PyObject_Call(mytype); Of course this latter part is dependent on rctypes being completed first. I understand and respect Christian's needs for something that works right now; nevertheless, at this point I think the mixed modules approach is the best medium-term solution. This is all open to discussion, of course. ...but do keep in mind that the people who completed the annotator as it is now are not likely to appreciate the addition of Yet More Ad-Hoc Heuristics all over the place :-/ A bientot, Armin