Static analysis of CPython using coccinelle/spatch
Has anyone else looked at using Coccinelle/spatch[1] on CPython source code? It's a GPL-licensed tool for matching semantic patterns in C source code. It's been used on the Linux kernel for detecting and fixing problems, and for autogenerating patches when refactoring (http://coccinelle.lip6.fr/impact_linux.php). Although it's implemented in OCaml, it is scriptable using Python. I've been experimenting with using it on CPython code, both on the core implementation, and on C extension modules. As a test, I've written a validator for the mini-language used by PyArg_ParseTuple and its variants. My code examines the types of the variables passed as varargs, and attempts to check that they are correct, according to the rules here http://docs.python.org/c-api/arg.html (and in Python/getargs.c) It can detect this old error (fixed in svn r34931): buggy.c:12:socket_htons:Mismatching type of argument 1 in ""i:htons"": expected "int *" but got "unsigned long *" Similarly, it finds the deliberate error in xxmodule.c: xxmodule.c:207:xx_roj:unknown format char in "O#:roj": '#' (Unfortunately, when run on the full source tree, I see numerous messages, and as far as I can tell, the others are false positives) You can see the code here: http://fedorapeople.org/gitweb?p=dmalcolm/public_git/check-cpython.git;a=tre... and download using anonymous git in this manner: git clone git://fedorapeople.org/home/fedora/dmalcolm/public_git/check-cpython.git The .cocci file detects invocations of PyArg_ParseTuple and determines the types of the arguments. At each matching call site it invokes python code, passing the type information to validate.py's validate_types. (I suspect it's possible to use spatch to detect reference counting antipatterns; I've also attempted 2to3 refactoring of c code using semantic patches, but so far macros tend to get in the way). Alternatively, are there any other non-proprietary static analysis tools for CPython? Thoughts? Dave [1] http://coccinelle.lip6.fr/
On Mon, Nov 16, 2009 at 12:27, David Malcolm
Has anyone else looked at using Coccinelle/spatch[1] on CPython source code?
Not that has been mentioned on the list before.
It's a GPL-licensed tool for matching semantic patterns in C source code. It's been used on the Linux kernel for detecting and fixing problems, and for autogenerating patches when refactoring (http://coccinelle.lip6.fr/impact_linux.php). Although it's implemented in OCaml, it is scriptable using Python.
I've been experimenting with using it on CPython code, both on the core implementation, and on C extension modules.
As a test, I've written a validator for the mini-language used by PyArg_ParseTuple and its variants. My code examines the types of the variables passed as varargs, and attempts to check that they are correct, according to the rules here http://docs.python.org/c-api/arg.html (and in Python/getargs.c)
It can detect this old error (fixed in svn r34931): buggy.c:12:socket_htons:Mismatching type of argument 1 in ""i:htons"": expected "int *" but got "unsigned long *"
Similarly, it finds the deliberate error in xxmodule.c: xxmodule.c:207:xx_roj:unknown format char in "O#:roj": '#'
(Unfortunately, when run on the full source tree, I see numerous messages, and as far as I can tell, the others are false positives)
You can see the code here: http://fedorapeople.org/gitweb?p=dmalcolm/public_git/check-cpython.git;a=tre... and download using anonymous git in this manner: git clone git://fedorapeople.org/home/fedora/dmalcolm/public_git/check-cpython.git
The .cocci file detects invocations of PyArg_ParseTuple and determines the types of the arguments. At each matching call site it invokes python code, passing the type information to validate.py's validate_types.
(I suspect it's possible to use spatch to detect reference counting antipatterns; I've also attempted 2to3 refactoring of c code using semantic patches, but so far macros tend to get in the way).
Alternatively, are there any other non-proprietary static analysis tools for CPython?
Specific to CPython? No. But I had a chance to run practically every major commercial static analysis tool over the code base back on 2006. We also occasionally run valgrind over the code. But thanks to have we have structured the code and taken performance shortcuts static analysis tools easily get tripped up by CPython (as you have discovered).
Thoughts?
Running the tool over the code base and reporting the found bugs would be appreciated. -Brett
Dave
[1] http://coccinelle.lip6.fr/
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Tue, 2009-11-17 at 13:03 -0800, Brett Cannon wrote:
On Mon, Nov 16, 2009 at 12:27, David Malcolm
wrote: Has anyone else looked at using Coccinelle/spatch[1] on CPython source code? [snip]
Running the tool over the code base and reporting the found bugs would be appreciated.
Discounting the false positives, the only issue it finds in python itself (trunk) is the deliberate mistake in Modules/xxmodule.c I also ran it on a random sample of extension modules and found some real bugs (only reported downstream so far, within Fedora's bug tracker): - DBus python bindings assume in one place that "unsigned long" is 32 bits wide: https://bugzilla.redhat.com/show_bug.cgi?id=538225 - MySQL-python assumes in one place that sizeof(int) == sizeof(long): https://bugzilla.redhat.com/show_bug.cgi?id=538234 - rpm.ps.append() uses unrecognized 'N' format specifier: https://bugzilla.redhat.com/show_bug.cgi?id=538218
On Mon, Nov 16, 2009 at 03:27:53PM -0500, David Malcolm wrote:
Has anyone else looked at using Coccinelle/spatch[1] on CPython source code?
For an excellent explanation of Coccinelle, see http://lwn.net/Articles/315686/. --amk
A.M. Kuchling wrote:
On Mon, Nov 16, 2009 at 03:27:53PM -0500, David Malcolm wrote:
Has anyone else looked at using Coccinelle/spatch[1] on CPython source code?
For an excellent explanation of Coccinelle, see http://lwn.net/Articles/315686/.
For those who have not looked, Coccinelle means ladybug (a bug-eating bug ;-) in French. Its principle use to to take C code and a SmPl file of high-level patch descriptions (fixers, in 2to3 talk) and produce a standard diff file. I wonder if this could be used to help people migrate C extensions to 3.1, by developing a SmPl file with the needed changes dictated by API changes. This is similar to its motivating application to Linux. From http://coccinelle.lip6.fr/ "Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. " As I understand it, the problem with C extensions and 3.1 is the current lack of a "collateral evolution" tool like 2to3 for Python code. Terry Jan Reedy
On Tue, 2009-11-17 at 19:45 -0500, Terry Reedy wrote:
A.M. Kuchling wrote:
On Mon, Nov 16, 2009 at 03:27:53PM -0500, David Malcolm wrote:
Has anyone else looked at using Coccinelle/spatch[1] on CPython source code?
For an excellent explanation of Coccinelle, see http://lwn.net/Articles/315686/.
For those who have not looked, Coccinelle means ladybug (a bug-eating bug ;-) in French. Its principle use to to take C code and a SmPl file of high-level patch descriptions (fixers, in 2to3 talk) and produce a standard diff file. I wonder if this could be used to help people migrate C extensions to 3.1, by developing a SmPl file with the needed changes dictated by API changes. This is similar to its motivating application to Linux. From
"Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. "
As I understand it, the problem with C extensions and 3.1 is the current lack of a "collateral evolution" tool like 2to3 for Python code. Indeed; I think it may be possible to use Coccinelle for this.
Here's a .cocci semantic patch to convert non-PyObject* dereferences of an "ob_type" field to use Py_TYPE macro instead. @@ PyObject *py_obj_ptr; type T; T non_py_obj_ptr; @@ ( py_obj_ptr->ob_type | - non_py_obj_ptr->ob_type + Py_TYPE(non_py_obj_ptr) ) I was able to use this to generate the attached patch for the DBus python bindings. Note that it leaves dereferences of a PyObject* untouched, and works inside sub-expressions. (There's some noise at the typedef of Server; I don't know why). Hope this is helpful Dave
participants (4)
-
A.M. Kuchling
-
Brett Cannon
-
David Malcolm
-
Terry Reedy