Hi Cython-Developers,

I am using Cython to generate a trivial wrapper for a small subset of the (already small) interface of libp11 (see https://github.com/OpenSC/libp11 for information about libp11). I know a bit about Python extension modules and just wanted to avoid writing all that boiler plate code and decided to given Cython a try.

The wrapper was done in a day and no big deal, but lately I got random segmentation faults using it.

After a day of debugging I found the cause in my use of the __dealloc__ special method. You may now call me stupid, because it is all in the docs:
You need to be careful what you do in a __dealloc__() method. By the time your __dealloc__() method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.

http://docs.cython.org/src/userguide/special_methods.html?highlight=__dealloc__#finalization-method-dealloc
But this did not give me the crucial hint: Currently, any references to Python objects may be invalid at the time dealloc is called. This is because Cython generates a tp_clear function for each extension type that calls Py_CLEAR for each object reference.

What I would really like to have is a possibility to exempt object attributes from being cleared in tp_clear, by adding a modifier to the cdef:

Exempt all attributes:
cdef noclear class Slots:

    cdef Context context
Exempt a single attribute:
cdef class Slots:

    cdef noclear Context context
This is probably enough of explanation for the Cython experts, but I would still like to explain what happened in my case and a work around for illustration purposes and in the hope, that this will help somebody else at some time.

Background

When using the libp11 API, the client must create a Context object which is used for most operations. To query the list of slots where a card (token) could be inserted, there is a PKCS11_enumerate_slots function which returns a newly allocated buffer which contains all slots. Unfortunately this means that the resulting Slot objects can not be deallocated individually, only the whole buffer can be released using PKCS11_release_all_slots.
This is completely unpythonic which is why I created a special Slots extension type that manages the storage of the Slot instances. When all Slot instances drop their reference to the Slots instance, the refcount of the Slots instance drops to zero and Python will release it automatically.
The Slots type then calls the PKCS11_release_all_slots function to release the storage. This is modelled inside the __dealloc__ method for Slots. Unfortunately, a pointer to the context is required for that call. Therefore Slots also keeps a reference to Context object to keep it alive long enough and to identify the underlying C object.

But: At the time when __dealloc__ is called, the tp_clear function already has cleared the context reference. Interestingly, it does not use Py_CLEAR as mandated in the Python documentation (see http://docs.python.org/2/c-api/typeobj.html?highlight=py_clear#PyTypeObject.tp_clear and http://docs.python.org/3.3/c-api/typeobj.html?highlight=py_clear#PyTypeObject.tp_clear) but instead redirects all Python objects to Py_NONE, which partially hides the problem.

Somehow, this is not deterministic and tp_clear may have been called or not before tp_dealloc, therefore the random crashes. Even more funny: tp_clear seems to get only called if the Slots instance is involved in a reference cycle.

Example Code

I attached an example Cython project. If you build and install it with Cython 0.19, the following Python code will work:
from fakep11 import *

def works():
    ctx = Context()
    my_slots = ctx.slots()
    while my_slots:
        assert ctx.usage_counter == 1
        my_slots.pop()
    assert ctx.usage_counter == 0
But the following code will crash when the garbage collector runs:
def crashes():
    ctx = Context()
    slots = ctx.slots()
    a = {"slot": slots[0]}
    b = {"slot": slots[1], "cycle": a}
    a["cycle"] = b
This is because a and b refer to each other, creating a cycle, and both refer to a slot, pulling the Slots instance into the object graph of the cycle. For this reason, tp_clear will be called for slots before tp_dealloc is called, leaving tp_dealloc with invalid data.

Conclusion

  1. As somebody who wrote Python extension modules manually before, I fell into this trap because I never implemented tp_clear. Reference cycles were a non-issue to me, since the wrapped objects could not refer to Python objects. Interestingly, SWIG does not seem to have any support for calling tp_clear it seems.
  2. IMHO tp_clear should really use Py_CLEAR which would at least make this case a solid segfault.
  3. Generating tp_clear functions is a feature that can do some harm, there I propose to allow to disable it.

Work around

For my current project I worked around this problem by managing the Python references manually by declaring them as PyObject* as in this diff:
diff -r f2afc4865bbc fakep11.pyx
--- a/fakep11.pyx       Sat Apr 20 23:43:00 2013 +0200
+++ b/fakep11.pyx       Sun Apr 21 23:50:10 2013 +0200
@@ -1,5 +1,10 @@
 cimport cython
 
+cdef extern from "Python.h":
+    ctypedef struct PyObject
+    void Py_INCREF(PyObject *)
+    void Py_DECREF(PyObject *)
+
 
 cdef extern from "fakep11c.h":
     ctypedef struct fakep11_context:
@@ -35,17 +40,19 @@
 @cython.internal
 cdef class Slots:
 
-    cdef Context context
+    cdef PyObject *context
     cdef unsigned int nslots
     cdef fakep11_slot *slots
 
     def __cinit__(self, Context context):
-        self.context = context
+        self.context = <PyObject*> context
+        Py_INCREF(self.context)
         self.slots = fakep11_enumerate(context.imp, &self.nslots)
 
     def __dealloc__(self):
         print "Slots dealloc"
-        fakep11_release_all(self.context.imp, self.slots, self.nslots)
+        fakep11_release_all((<Context>self.context).imp, self.slots, self.nslots)
+        Py_DECREF(self.context)
 
     def as_list(self):
         l = [None] * self.nslots
That way I have full control over the reference counting and can make sure that context is a valid reference when __dealloc__ is called.

Please find my fake libp11 wrapper example code attached, as well as a small patch for the documentation documenting the current status.

Greetings, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden
Tel: +49-(0)351-4519587
Fax: +49-(0)351-4519561

mailto:torsten.landschoff@dynamore.de
http://www.dynamore.de

DYNAmore Gesellschaft für FEM Ingenieurdienstleistungen mbH
Registration court: Stuttgart, HRB 733694
Managing director: Prof. Dr. Karl Schweizerhof, Dipl.-Math. Ulrich Franz