On Thu, Oct 23, 2008 at 5:25 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Hey.
First sorry for late response, we're kind of busy doing other things now (ie working on 2.5-compatible release). That doesn't mean we don't appreciate input about our problems.
On Fri, Oct 17, 2008 at 5:50 AM, Geoffrey Irving <irving@naml.us> wrote: <snip>
That's true. PyPy is able to handle pointers to any C place.
.. with
separate information about type and ownership.
We don't provide this, since C has no notion of that at all.
At the lowest level the type is just a hashable identifier object, so it can probably implemented at the RPython level. E.g., # RPython type-safety layer class CppObject: def __init__(ptr, type): self.ptr = ptr # pointer to the actual C++ instance self.type = type # represents the C++ type self.destructor = type.destructor # function pointer to destructor def __traverse__(self): ... traverse through list of contained python object pointers ... def __del__(self): CCall(self.destructor, self.ptr) class CppFunc: def __init__(ptr, resulttype, argtypes): self.ptr = ptr self.resulttype = resulttype self.argtypes = argtypes def __call__(self, *args): if len(args) != len(self.argtypes): raise TypeError(...) argptrs = [] for a,t in zip(args,self.argtypes): if not isinstance(a, CppObject) or a.type != t: raise TypeError(...) argptrs.append(a.ptr) resultptr = Alloc(self.resulttype.size) try: CppCall(self.ptr, resultptr, *argptrs) # assumes specific calling convention except CppException, e: # CppCall would have to generate this Dealloc(resultptr) raise CppToPythonException(e) return CppObject(resultptr, self.resulttype) If this layer is written in RPython, features like overload resolution and C++ methods can be written in application-level python without worring about safety.
<snip>
As I mentioned in the blog comment, a lot of these issues come up in contexts outside C++, like numpy. Internally numpy represents operations like addition as a big list of optimized routines to call depending on the stored data type. Functions in these tables are called on raw pointers to memory, which is fundamental since numpy arrays can refer to memory inside objects from C++, Fortran, mmap, etc. It'd be really awesome if the type dispatch step could be written in python but still call into optimized C code for the final arithmetic.
That's the goal. Well, not exactly - point is that you write this code in Python/RPython and JIT is able to generate efficient assembler out of it. That's a very far-reaching goal though to have nice integration between yet-non-existant JIT and yet-non-existant PyPy's numpy :-)
Asking the JIT to generate to generate efficient code might be sufficient in this case, but in terms of this discussion it just removes numpy as a useful thought experiment towards C++ bindings. :) Also for maximum speed I doubt the JIT will be able to match custom code such as BLAS, given that C++ compilers usually don't get there either.
The other major issue is safety: if a lot of overloading and dispatch code is going to be written in python, it'd be nice to shield that code from segfaults. I think you can get a long way there just by having a consistent scheme for boxing the three components above (pointer, type, and reference info), a way to label C function pointers with type information, a small RPython layer that did simple type-checked calls (with no support for overloading or type conversion). I just wrote a C++ analogue to this last part as a minimal replacement for Boost.Python, so I could try to formulate what I mean in pseudocode if there's interest. There'd be some amount of duplicate type checking if higher level layers such as overload resolution were written in application level python, but that duplication should be amenable to elimination by the JIT.
I think for now we're happy with extra overhead. We would like to have *any* working C++ bindings first and then eventually think about speeding it up.
Another advantage of splitting the code into an RPython type-safety layer and application-level code is that the latter could be shared with between pypy and cpython. I haven't looked at reflex at all, but in Boost.Python most of the complexity goes into code that could exist at the application-level. Geoffrey