gcc python plugin and static analyser for CPython sources

David Malcolm dmalcolm at redhat.com
Tue Jun 21 21:01:32 CEST 2011


I've been working on a new plugin for GCC, which supports embedding
Python within GCC, exposing GCC's internal data structures as Python
objects and classes.

The plugin links against libpython, and (I hope) allows you to invoke
arbitrary Python scripts from inside a compile.  My aim is to allow
people to write GCC "plugins" as Python scripts, and to make it much
easier to prototype new GCC features.

As an example of use for the plugin, I have also been writing a static
analysis tool for checking the C code of CPython extension modules.  So
far this only checks the arguments passed to PyArg_ParseTuple*, but I'm
working on autodetecting reference counting errors, and turning these
into compile-time warnings (See [1])

The plugin is Free Software, licensed under the GPLv3 (or later).

The code can be seen here:
 
  http://git.fedorahosted.org/git/?p=gcc-python-plugin.git;a=summary

and the website for the plugin is the Trac instance here:

  https://fedorahosted.org/gcc-python-plugin/

The documentation is in the "docs" subdirectory (using sphinx).  You can
see a pre-built HTML version of the docs here:

  http://readthedocs.org/docs/gcc-python-plugin/en/latest/index.html

It's still at the "experimental proof-of-concept stage"; expect crashes
and tracebacks.

However, it is already possible to use this to add additional compiler
errors/warnings, e.g. domain-specific checks, or static analysis.

One of my goals for this is to "teach" GCC about the common mistakes
people make when writing extensions for CPython [1], but it could be
used
  - e.g. to teach GCC about GTK's reference-counting semantics, 
  - to check locking in the Linux kernel
  - to check signal-safety in APIs, etc
  - rapid prototyping

Other ideas include visualizations of code structure.   There are handy
methods for plotting control flow graphs (using graphviz), showing the
source code interleaved with GCC's internal representation, such as the
one here:

  http://readthedocs.org/docs/gcc-python-plugin/en/latest/cfg.html


It could also be used to build a more general static-analysis tool.

The CPython API checker has the beginnings of this:

Example output:

test.c: In function ‘leaky’:
test.c:21:10: error: leak of PyObject* reference acquired at call to
PyList_New at test.c:21 [-fpermissive]
  test.c:22: taking False path at     if (!list)
    test.c:24: reaching here     item = PyLong_FromLong(42);
  test.c:27: taking True path at     if (!item)
  test.c:21: returning NULL

Numerous caveats right now (e.g. how I deal with loops is really
dubious).  It's disabled for now within the source tree (I need to fix
my selftests to pass again...)  It perhaps could be generalized to do
e.g. {malloc,FILE*, fd} leaks, array bounds checking, int overflow, etc,
but obviously that's a far bigger task.

So far, I'm just doing a limited form of "abstract interpretation" (or,
at least, based on my understanding of that term), dealing with explicit
finite prefixes of traces of execution, tracking abstract values (e.g.
NULL-ptr vs non-NULL-ptr) and stopping when the trace loops (which is
just an easy way to guarantee termination, not a good one, but for my
use-case is good enough, I hope.  Plus it ought to make it easier to
generate highly-readable error messages).

Thanks to Red Hat for allowing me to devote a substantial chunk of
$DAYJOB to this over the last couple of months.

I hope this will be helpful to both the GCC and Python communities.

Dave

[1] see
http://readthedocs.org/docs/gcc-python-plugin/en/latest/cpychecker.html
and
https://fedoraproject.org/wiki/Features/StaticAnalysisOfCPythonExtensions






More information about the Python-announce-list mailing list