[pypy-dev] GSoC 2015: cpyext project?

Amaury Forgeot d'Arc amauryfa at gmail.com
Wed Dec 3 20:16:14 CET 2014


Hello Toby,

Overall it's a nice goal, but I don't think that improving cpyext is easy.
Its goal is to reproduce the CPython API, in all its details and caveats.
I will list some of them to explain why I think it's a difficult task:

- First, PyPy objects have no fixed layout exposed to C code. for example,
PyPy has multiple implementations of lists and dicts, that are chosen at
runtime, and can even change when the object is mutated, so all the
concrete functions of the CPython API need to use the abstract object
interface (e.g. PyList_GET_ITEM is not a C macro, but a Python call to
type(x).__getitem__, fetched from the class dictionary)

- Then, PyPy uses a moving garbage collector, which move allocated objects
when they survive the first collection. This is not what users of PyObject*
pointers expect, the address has to stay the same for the life of the
object. So cpyext allocates a PyObject struct at a fixed address, and uses
a mapping (expensive!) each time the object crosses the boundary between
the interpreted and the C extension. There is even a ob_refcount field,
which keeps track of the number of the references held in C code; and
borrowed references were a nightmare to implement correctly.
And I'm sure we don't correctly handle circular references between
PyObjects...

- Finally, there is a lot of code that directly accesses C struct members
(very common: obj->ob_type->tp_name). So each time an object goes from
Python to the C extension, cpyext needs to allocate a struct which contains
all these fields, recursively, only to delete them when the call returns,
even when the C code does not actually use these fields.

Even if cpyext can be made a bit faster, the issues above won't disappear,
if we want to support all the semantics implied by the CPython API.
And believe me, all the features we implemented are needed by one extension
or another.
I'd say that cpyext is quite mature, because it provides all the
infrastructure to support almost all extension modules, and went much
farther than we initially expected.
But I think it went as far as possible given the differences between
CPython and PyPy.


There is a solution though, which is also a nice project:
Since "cffi" is the preferred way to access C code from PyPy,
you could instead write a version of boost::python (maybe renamed to
boost::python_cffi)
that uses cffi primitives to implement all the boost functions: class_(),
def(), and so on.

I started this idea some time ago already, and I was able to support the
"hello world" example of boost::python.
This one:
http://www.boost.org/doc/libs/1_57_0/libs/python/doc/tutorial/doc/html/index.html#quickstart.hello_world
I need to find the code I wrote so I can share it (around 250 lines);
basically it's a rewrite of boost::python, but using a slightly different C
API (to use Python features from C++), and a completely different way to
manage memory (similar to JNI: there are Local and Global References
<http://www.science.uva.nl/ict/ossdocs/java/tutorial/native1.1/implementing/refs.html>,
and ffi.new_handle() to create references from objects). This method is
much more friendly to PyPy and its JIT (mostly because references don't
need to be memory addresses!)

Or maybe you'll find that boost::python is quite complex to reimplement
correctly (because it's boost), and you will decide to use directly the C
API defined above. I remember there are functions like Object_SetAttrString
and PyString_FromString, and it's easy to add new ones.
Of course this requires to rewrite all your bindings from scratch, but
since all the code will be in Python (with snippets of C++) you will find
that there are better way than C++ templates to generate code from regular
patterns.

I haven't seen yet any serious module that uses cffi to interface C++, so
any progress in this direction would be awesome.



2014-11-28 20:13 GMT+01:00 Toby St Clere Smithe <mail at tsmithe.net>:

> Hi all,
>
> I've posted a couple of times on here before: I maintain a Python
> extension for GPGPU linear algebra[1], but it uses boost.python. I do
> most of my scientific computing in Python, but often am forced to use
> CPython where I would prefer to use PyPy, largely because of the
> availability of extensions.
>
> I'm looking for an interesting Google Summer of Code project for next
> year, and would like to continue working on things that help make
> high-performance computing in Python straight-forward. In particular,
> I've had my eye on the 'optimising cpyext'[2] project for a while: might
> work in that area be available?
>
> I notice that it is described with difficulty 'hard', and so I'm keen to
> enquire early so that I can get up to speed before making a potential
> application in the spring. I would love to work on getting cpyext into a
> good enough shape that both Cython and Boost.Python extensions are
> functional with minimal effort on behalf of the user. Does anyone have
> any advice? Are there particular things I should familiarise myself
> with? I know there is the module/cpyext tree, but it is quite formidable
> for someone uninitiated!
>
> Of course, I recognise that cpyext is a much trickier proposition in
> comparison with things like cffi and cppyy. In particular, I'm very
> excited by cppyy and PyCling, but they seem quite bound up in CERN's
> ROOT infrastructure, which is a shame. But it's also clear that very
> many useful extensions currently use the CPython API, and so -- as I
> have often found -- the apparent relative immaturity of cpyext keeps
> people away from PyPy, which is also a shame!
>
> [1] https://pypi.python.org/pypi/pyviennacl
> [2] https://bitbucket.org/pypy/pypy/wiki/GSOC%202014
>
> Best,
>
> Toby
>
>
> --
> Toby St Clere Smithe
> http://tsmithe.net
>
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>



-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20141203/c46d6abc/attachment.html>


More information about the pypy-dev mailing list