[Cython] Gsoc project

David Malcolm dmalcolm at redhat.com
Thu Mar 29 18:25:28 CEST 2012


On Thu, 2012-03-29 at 11:10 +0100, mark florisson wrote:

Thanks for CCing me; various comments inline below throughout.

> On 29 March 2012 04:28, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> > On 03/28/2012 07:58 PM, Philip Herron wrote:
> >>
> >> Hey all
> >>
> >> I am implemented a very crude and simplistic and very badly programmed
> >> version of a pxd generator i think i understand what were after now
> >> but i would appreciate if you look over what i did to make sure i have
> >> grasped the basic idea for now:

[...snip example sources...]

> >> We run gcc -fplugin=./python.so -fplugin-arg-python-script=walk.py test.c

FWIW, the plugin has a helper script, so that you ought to be able to
simply run:
  ./gcc-with-python walk.py test.c
(paths permitting)

My primary use-case for the plugin is my libcpychecker code which
implements static analysis of refcount-handling, and for that I have
another helper script "gcc-with-cpychecker" that invokes my code so that
you can simply run:
  ./gcc-with-cpychecker -I/usr/include/python2.7 test.c

So you might want to do something similar for the pxd generation.

[...snip sample output...]

> > Another slight complication is that you should ideally turn
> >
> > #define FOO 3
> > #define BAR 4
> >
> > into
> >
> > cdef extern from "foo.h":
> >    enum:
> >        FOO
> >        BAR
> >
> > so you need to hook in before the preprocessor and after the preprocessor
> > and dig out different stuff.
> 
> David, I'm CCing you as this might be of interest to you.

Very much so - thanks!  (Hi everyone!)

FWIW, I happened to see Dag's earlier email via a google search, and
added the Cython idea to the list of "Ideas for using the GCC plugin"
here:
http://gcc-python-plugin.readthedocs.org/en/latest/getting-involved.html#ideas-for-using-the-plugin

> I think the current GCC plugin support doesn't allow you to do much
> with te preprocessor, it operates entirely after the C preprocessor
> has run. 
So far, yes.  I haven't explored GCC's C frontend as much as I have the
stages that follow.  The C preprocessor does run in-process; I don't
know yet to what extent it's amenable to hacking via a GCC plugin.  I
believe that aspects of its integration may have been rewritten somewhat
in GCC 4.7 (some of my colleagues tried to improve the line-numbering
capture in the presence of macros).

> So to support macros we have to consider that for this to
> work the gcc plugin may have to be extended, which uses C to extend
> GCC and Python, so it also requires knowledge of the CPython C API.
Yes; I'd expect you to have to go digging into the guts of the GCC C
preprocessor implementation, using GDB.

I don't know yet how feasible it is to get at the data from a plugin: it
might be anywhere from "easy" to "impossible".  You might need to get a
patch into GCC to expose the necessary information (if so, that would
probably be worthy of a GSoC slot, I think).

One issue is that although GCC has an API for plugins to use to register
themselves, it doesn't yet have an official API for plugins to use for
doing anything else, so we're somewhat at the mercy of future GCC
developments (hopefully Python will make it easier to survive future
internal interface changes though).

BTW, the Python plugin's API isn't 100% frozen yet: I still reserve the
right to tweak things if appropriate (I've only done this occasionally
though, and I've gone through all the code I know of when I do to
doublecheck if I'm about to break something).

> David, would you mind elaborating why C was used for this project and
> not (partially) Cython, and would it be possible to extend the plugin
> with Cython?
I did initial try using Cython: see early commits here:
http://git.fedorahosted.org/git/?p=gcc-python-plugin.git;a=commitdiff;h=4d62721d519008c325d7369f1330dc09080c0b51
http://git.fedorahosted.org/git/?p=gcc-python-plugin.git;a=commitdiff;h=9b5145955c823453404c49e4b295e8c739c5ff44

but GCC internals are just too, err, "baroque" (that's a euphemism): it
makes very heavy use of the C preprocessor (e.g. *all* field accesses go
through an access macro; there are garbage-collection annotations
thoughout); many of the types are declared by repeatedly
#include-ing .def files using macro definitions to expand the contents
in a variety of ways.

> > Then what happens if you have
> >
> > #ifdef FOO
> > #define BAR 3
> > #else
> > #define BAR 4
> > #endif
> >
> > ?? I'm not saying it is hard, but perhaps no longer completely trivial :-)
Yeah.  I have no idea to what extent the C preprocessor stuff is exposed
internally, and if the branching logic is preserved in any way that's
usable.

[...snip...]

> > Does gccgo use the C ABI so that Cython could call it? If so, go for it!
> >
> > (Fortran is actually very much in use in the Cython userbase and would get a
> > lot more "customers" than Go, but if you have more of a CS background or
> > similar I can see why you wouldn't be so interested in Fortran. I didn't
> > believe people were still using Fortran either until I started doing
> > astrophysics, and suddenly it seems to be the default tool everybody uses
> > for everything.)
I downloaded Philip's script from
http://mail.python.org/pipermail/cython-devel/attachments/20120329/cdeb9453/attachment.py

It's running immediately before "free_lang_data", which is the first
interprocedural "whole-file" optimization pass, after some per-function
passes have been run.

You can see a map of the passes here:
http://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html

[See also
http://gcc-python-plugin.readthedocs.org/en/latest/callbacks.html#gcc.PLUGIN_PASS_EXECUTION
for notes on how the sample code I showed Dag at PyCon works]

So my guess is that this code can be run for *all* languages that GCC
can handle: all of the language frontends feed in data near the top of
that map: so in theory this ought to work for Fortran, C++, Go, etc.

Having said that, I've been trying to get my libcpychecker code running
on C++ and I keep running into subtle difference in the exact data they
generate: e.g. the C++ frontend seems to add Nop statements for empty
functions, whereas the C frontend doesn't; type declarations get hidden
inside namespace objects in the C++ frontend; etc etc.

BTW, some stylistic nits on Philip's script:
   * don't match types based on strings: c.f.:
     if T == "<type 'gcc.FunctionDecl'>":
instead, use isinstance:
      if isinstance(decl.type, gcc.FunctionDecl)
so that you're not relying on repr() or str(), and so you match
subclasses, not just one class

   * "decl_location_get_file (decl)" jumps through lots of hoops to get
at the filename of a decl.location by parsing the repr().  But you can
simply look at the decl.location.file attribute:
http://gcc-python-plugin.readthedocs.org/en/latest/basics.html#gcc.Location.file

 * similar considerations apply to decl_identifier_node_to_string();
have a look at the dir() of the object (and if something is not
documented, file a bug, or a patch!).

Hope this is helpful; good luck!
Dave



More information about the cython-devel mailing list