[Cython] Gsoc project

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Mar 28 05:05:25 CEST 2012

On 03/27/2012 02:17 PM, Philip Herron wrote:
> Hey
> I got linked to your idea
> http://groups.google.com/group/cython-users/browse_thread/thread/cb8aa58083173b97/cac3cf12d438b122?show_docid=cac3cf12d438b122&pli=1
> by David Malcolm on his plugin mailing list.
> I am looking to apply to Gsoc once again this year i have done gsoc
> 2010 and 2011 on GCC implementing my own GCC front-end for python
> which is still in very early stages since its a huge task. But i am
> tempted to apply to this project to implement a more self contained
> project to give back to the community more promptly while that hacking
> on my own front-end on my own timer. And i think it would benefit me
> to get to understand in more detail different aspects of python which
> is what i need and would gain very much experience from.

Excellent! After talking to lots of people at PyCon about Cython, it is 
obvious that auto-generation of pxd files is *the* most missed feature 
in Cython today. If you do this, lots of Cython users will be very grateful.

> I was wondering if you could give me some more details on how this
> could all work i am not 100% familiar with cython but i think i
> understand it to a good extend from playing with it for most of my
> evening. I just want to make sure i understand the basic use case of
> this fully, When a user could have something like:
> -header foo.h
> extern int add (int, int);
> -source foo.c
> #include "foo.h"
> int add (int x, int y)
> {
>    return x+y;
> }
> We use the plugin to go over the decls created and create a pxd file like:
> cdef int add (int a, int b):
>      return a + b
> Although this is a really basic example i just want to make sure i
> understand whats going on. Maybe some more of you have input? I guess
> this would be best suited as a proposal for Python rather than GCC?

This isn't quite what should be done. Cython generates C code that 
includes C header files; what the pxd files are needed for is to provide 
declarations for Cython about what is available on the C side (during 
the Cython->C translation/compilation).

So: "foo.c" is irrelevant to Cython. And, foo.h should turn into foo.pxd 
like this:

cdef extern from "foo.h":
     int add(int, int)

Let us know if you have any question; you may want to look at examples 
for using Cython to wrap C code, such as


and the rest of the pyzmq code.

Moving over to the idea of making this a GSoC:

First, we have a policy of requiring patches from prospective students 
in addition to their application. Often, this has been to fix a bug or 
two in Cython. However, given that pxd generation can be done without 
much digging into Cython itself, I think that something like a crude 
prototype of the pxd generator (supporting only a subset of C) would be 
a better fit (other devs, what do you think?)

The project should contain at least:

  - The wrapper generator itself
  - Tests for it (including the task of figuring out how to test this, 
possibly both unit tests and integration tests)
  - A strategy for testing it for all relevant versions of gcc; one 
should probably set up Jenkins jobs for it

Even then, I feel that this is rather small for a full GSoC, even when 
supporting the subset of C++ supported by Cython, I would estimate a 
month or so (and GSoC is two months). So it should be extended in one 
direction or another. Some ideas:

  - Very often one is not interested in the full header file. One really 
wants "the API", not a translation of the C header. This probably 
requires a) some heuristics, and b) the possibility for, as easily as 
possible, write some selectors/configuration for what should be included 
and not. Making that end-user-friendly is perhaps a challenge, I'm not sure.

One idea here is to make possible an interplay where you look at the pyx 
file what needs to be wrapped. I.e. you first try to use a function in 
the pyx file as if it had already been declared, then run the pxd 
generator feeding in the pyx files (and .h files), and out comes the 
required pxd file bridging the two (containing only the used subset).

  - Support using clang to parse C code in addition

  - There's a problem in that an often-used Cython approach is:

  1) Generate C file from pyx and pxd files
  2) Ship to other computers
  3) Compile C file

However, this is fragile when combined with auto-generated pxd files, 
because the resulting pxd may be different depending on whether -DFOO is 
given to gcc or not.

The above 3 steps are possible because Cython often does not care about 
the exact type of something, just basic type and signedness. So if you do

cdef extern from "foo.h":
     ctypedef int sometype_t

then sometype_t can actually be a short or a char, and Cython doesn't 
care. (Similarly, not all fields of a struct needs to be exposed, only 
the ones that form part of the API.)

However, I'm not sure if the quality of an auto-generated pxd file is 
good enough for this approach.

So either a) the wrapper generator and Cython must be plugged into the 
typical setup.py build, or b) one figures out something clever (or, 
likely, more than one clever thing) which allows to continue using the 
above workflow.

Either a) and b), or both, could be part of the project. a) essentially 
requires looking at Cython.Distutils. For b), it *may* involve hooking 
into gcc *before* the preprocessor is run and take into account #ifdef 
etc, if that is even possible, and new features in Cython for specifying 
in a pxd file that "there's an #ifdef here", and see if that can somehow 
result in intelligently generated C code.

PS. I should stress that a pxd generator is *very* useful -- because it 
would do 90% of the job, and even if humans need to do the last 10% it 
is still a major timesaver.

  - More straightforward than the above: Parse Fortran through the 
gfortran GCC frontend. The Fwrap program 
(https://github.com/fwrap/fwrap) has been dormant in terms of 
development past couple of years, but is still the most promising way of 
bringing Fortran and Cython together.

Part of Fwrap's problem is the existing parser. Changing to using the 
gfortran as the parser would be spectacular, and probably revive the 
project. It has a solid test suite, so one would basically replace the 
parser component of Fwrap, make sure the test suite passes, and that 
would be it.

(Of course, few people outside the scientific community cares anything 
about Fortran.)

Those are some ideas. Remember: This is *your* project, so make sure you 
focus on features you'd find fun to play with and implement. And do NOT 
take all of the above, that's way too much :-), just find one or two 
extra features that help make the GSoC application really appealing.


More information about the cython-devel mailing list