[pypy-dev] C++ library bindings
fijall at gmail.com
Thu Oct 23 14:25:20 CEST 2008
First sorry for late response, we're kind of busy doing other things
now (ie working on 2.5-compatible release). That doesn't mean we don't
appreciate input about our problems.
On Fri, Oct 17, 2008 at 5:50 AM, Geoffrey Irving <irving at naml.us> wrote:
> I posted a response to your blog post on C++ library bindings, and
> wanted to continue the discussion further via email if anyone's
> interested. I just signed up for the mailing list, so apologies if I
> missed a lot of previous discussion. I'll say up front that it's
> unlikely that I'll be able to devote any actual coding effort to this,
> so feel free to tell me to get lost if you have plenty of ideas and
> not enough manpower. :)
That's fine. We don't have enough manpower to work on this now, but
knowing what people do in this area is very valuable once we get to
> I started out writing C++ bindings using Boost.Python, and was very
> happy with it for a long time. It's strongest point is the ability to
> wrap libraries that were never designed with python in mind,
> specifically code with poor and inflexible ownership semantics.
> Internally, this means that C++ objects are exposed indirectly through
> a holder object containing either an inline copy of the C++ object or
> any type of pointer holding the object. Every access to the object
> has to go through runtime dispatch in order to work with any possible
> holder type. The holder also contains the logic for ownership and
> finalization. For example, Boost.Python can return a reference to a
> field inside another object, in which case the holder will keep a
> reference to the parent object to keep it alive as long as the field
> reference lives.
> The problem with this generality is that it produces a huge amount of
> object code (wrapping a single function in Boost.Python can add 10k to
> the object file), and adds a lot of runtime indirection.
> Assuming that one is writing C++ bindings because of speed issues,
> it'd be nice if this extra layer of memory indirection and runtime
> dispatch was exposed to the (eventual) JIT. In order to do that, pypy
> would have to be capable of handling pointers to raw memory containing
> non-python objects (is already true due to ctypes stuff?)
That's true. PyPy is able to handle pointers to any C place.
> .. with
> separate information about type and ownership.
We don't provide this, since C has no notion of that at all.
> For example, if you
> have bindings for a C++ vector class and a C++ array containing the
> vectors, a "reference" to an individual vector in the array is really
> three different pieces:
> 1. The actual pointer to the vector.
> 2. A type structure containing functions to be called with the pointer
> (1) as an argument.
> 3. A list of references to other objects that need to stay alive while
> this reference lives.
> If pypy and the JIT ends up able to treat these pieces separately,
> it'd be a significant performance win over libraries wrapped with
> The other main source of slowness and complexity in Boost.Python is
> overloading support, but I think that part is fairly straightforward
> to handle in the python level. All Boost.Python does internally is
> loop over the set of functions registered for a given name, and for
> each one loop over the arguments calling into its converter registry
> to see if the python object can be converted to the C++ type.
> As I mentioned in the blog comment, a lot of these issues come up in
> contexts outside C++, like numpy. Internally numpy represents
> operations like addition as a big list of optimized routines to call
> depending on the stored data type. Functions in these tables are
> called on raw pointers to memory, which is fundamental since numpy
> arrays can refer to memory inside objects from C++, Fortran, mmap,
> etc. It'd be really awesome if the type dispatch step could be
> written in python but still call into optimized C code for the final
That's the goal. Well, not exactly - point is that you write this code
in Python/RPython and JIT is able to generate efficient assembler out
of it. That's a very far-reaching goal though to have nice integration
between yet-non-existant JIT and yet-non-existant PyPy's numpy :-)
> The other major issue is safety: if a lot of overloading and dispatch
> code is going to be written in python, it'd be nice to shield that
> code from segfaults. I think you can get a long way there just by
> having a consistent scheme for boxing the three components above
> (pointer, type, and reference info), a way to label C function
> pointers with type information, a small RPython layer that did simple
> type-checked calls (with no support for overloading or type
> conversion). I just wrote a C++ analogue to this last part as a
> minimal replacement for Boost.Python, so I could try to formulate what
> I mean in pseudocode if there's interest. There'd be some amount of
> duplicate type checking if higher level layers such as overload
> resolution were written in application level python, but that
> duplication should be amenable to elimination by the JIT.
I think for now we're happy with extra overhead. We would like to have
*any* working C++ bindings first and then eventually think about
speeding it up.
> That's enough for now. I'll look forward to the discussion. Most of
> my uses of python revolve heavily around C++ bindings, so it's
> exciting to see that you're starting to think about it even if it's a
> long way off.
Thank you :)
More information about the Pypy-dev