[Cython] Gsoc project

Robert Bradshaw robertwb at gmail.com
Wed Mar 28 22:08:53 CEST 2012

On Tue, Mar 27, 2012 at 8:05 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 03/27/2012 02:17 PM, Philip Herron wrote:
>> Hey
>> I got linked to your idea
>> http://groups.google.com/group/cython-users/browse_thread/thread/cb8aa58083173b97/cac3cf12d438b122?show_docid=cac3cf12d438b122&pli=1
>> by David Malcolm on his plugin mailing list.
>> I am looking to apply to Gsoc once again this year i have done gsoc
>> 2010 and 2011 on GCC implementing my own GCC front-end for python
>> which is still in very early stages since its a huge task. But i am
>> tempted to apply to this project to implement a more self contained
>> project to give back to the community more promptly while that hacking
>> on my own front-end on my own timer. And i think it would benefit me
>> to get to understand in more detail different aspects of python which
>> is what i need and would gain very much experience from.
> Excellent! After talking to lots of people at PyCon about Cython, it is
> obvious that auto-generation of pxd files is *the* most missed feature in
> Cython today. If you do this, lots of Cython users will be very grateful.

+1, this idea has been floated many times before, and I think it would
make a great GSoC project.

>> I was wondering if you could give me some more details on how this
>> could all work i am not 100% familiar with cython but i think i
>> understand it to a good extend from playing with it for most of my
>> evening. I just want to make sure i understand the basic use case of
>> this fully, When a user could have something like:
>> -header foo.h
>> extern int add (int, int);
>> -source foo.c
>> #include "foo.h"
>> int add (int x, int y)
>> {
>>   return x+y;
>> }
>> We use the plugin to go over the decls created and create a pxd file like:
>> cdef int add (int a, int b):
>>     return a + b
>> Although this is a really basic example i just want to make sure i
>> understand whats going on. Maybe some more of you have input? I guess
>> this would be best suited as a proposal for Python rather than GCC?
> This isn't quite what should be done. Cython generates C code that includes
> C header files; what the pxd files are needed for is to provide declarations
> for Cython about what is available on the C side (during the Cython->C
> translation/compilation).
> So: "foo.c" is irrelevant to Cython. And, foo.h should turn into foo.pxd
> like this:
> cdef extern from "foo.h":
>    int add(int, int)
> Let us know if you have any question; you may want to look at examples for
> using Cython to wrap C code, such as
> https://github.com/zeromq/pyzmq/blob/master/zmq/core/libzmq.pxd
> and the rest of the pyzmq code.
> Moving over to the idea of making this a GSoC:
> First, we have a policy of requiring patches from prospective students in
> addition to their application. Often, this has been to fix a bug or two in
> Cython. However, given that pxd generation can be done without much digging
> into Cython itself, I think that something like a crude prototype of the pxd
> generator (supporting only a subset of C) would be a better fit (other devs,
> what do you think?)

Yep, that would be sufficient for me.

> The project should contain at least:
>  - The wrapper generator itself
>  - Tests for it (including the task of figuring out how to test this,
> possibly both unit tests and integration tests)
>  - A strategy for testing it for all relevant versions of gcc; one should
> probably set up Jenkins jobs for it
> Even then, I feel that this is rather small for a full GSoC, even when
> supporting the subset of C++ supported by Cython, I would estimate a month
> or so (and GSoC is two months). So it should be extended in one direction or
> another. Some ideas:
>  - Very often one is not interested in the full header file. One really
> wants "the API", not a translation of the C header. This probably requires
> a) some heuristics, and b) the possibility for, as easily as possible, write
> some selectors/configuration for what should be included and not. Making
> that end-user-friendly is perhaps a challenge, I'm not sure.
> One idea here is to make possible an interplay where you look at the pyx
> file what needs to be wrapped. I.e. you first try to use a function in the
> pyx file as if it had already been declared, then run the pxd generator
> feeding in the pyx files (and .h files), and out comes the required pxd file
> bridging the two (containing only the used subset).
>  - Support using clang to parse C code in addition
>  - There's a problem in that an often-used Cython approach is:
>  1) Generate C file from pyx and pxd files
>  2) Ship to other computers
>  3) Compile C file
> However, this is fragile when combined with auto-generated pxd files,
> because the resulting pxd may be different depending on whether -DFOO is
> given to gcc or not.
> The above 3 steps are possible because Cython often does not care about the
> exact type of something, just basic type and signedness. So if you do
> cdef extern from "foo.h":
>    ctypedef int sometype_t
> then sometype_t can actually be a short or a char, and Cython doesn't care.
> (Similarly, not all fields of a struct needs to be exposed, only the ones
> that form part of the API.)
> However, I'm not sure if the quality of an auto-generated pxd file is good
> enough for this approach.
> So either a) the wrapper generator and Cython must be plugged into the
> typical setup.py build, or b) one figures out something clever (or, likely,
> more than one clever thing) which allows to continue using the above
> workflow.
> Either a) and b), or both, could be part of the project. a) essentially
> requires looking at Cython.Distutils. For b), it *may* involve hooking into
> gcc *before* the preprocessor is run and take into account #ifdef etc, if
> that is even possible, and new features in Cython for specifying in a pxd
> file that "there's an #ifdef here", and see if that can somehow result in
> intelligently generated C code.
> PS. I should stress that a pxd generator is *very* useful -- because it
> would do 90% of the job, and even if humans need to do the last 10% it is
> still a major timesaver.
>  - More straightforward than the above: Parse Fortran through the gfortran
> GCC frontend. The Fwrap program (https://github.com/fwrap/fwrap) has been
> dormant in terms of development past couple of years, but is still the most
> promising way of bringing Fortran and Cython together.
> Part of Fwrap's problem is the existing parser. Changing to using the
> gfortran as the parser would be spectacular, and probably revive the
> project. It has a solid test suite, so one would basically replace the
> parser component of Fwrap, make sure the test suite passes, and that would
> be it.
> (Of course, few people outside the scientific community cares anything about
> Fortran.)
> Those are some ideas. Remember: This is *your* project, so make sure you
> focus on features you'd find fun to play with and implement. And do NOT take
> all of the above, that's way too much :-), just find one or two extra
> features that help make the GSoC application really appealing.

To make the proposal more concrete (and progress mesurable), I might
suggest listing 8-10 specific, non-trivial libraries that your pxd
generator should be able to handle (e.g. 65% coverage by midterm,
100%, or even 95% by final evaluation). E.g. the C++ stl would be a
good candidate.

I'd love to see this supported.

- Robert

More information about the cython-devel mailing list