[Cython] Acquisition counted cdef classes

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Tue Oct 25 15:28:57 CEST 2011

On 10/25/2011 09:33 AM, Stefan Behnel wrote:
> mark florisson, 24.10.2011 21:50:
>> This is in response to
>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>> previous discussion on cython.parallel.
>> Basically I think we should have something more powerful than 'cdef
>> borrowed CdefClass obj', something that also doesn't rely on new
>> syntax.
> We will still need borrowed reference support in the compiler
> eventually, whether we make it a language feature or not.
>> What if we support acquisition counting for every instance of a cdef
>> class? In Python and Cython GIL mode you use reference counting, and
>> in Cython nogil mode and for structs attributes, array dtypes etc you
>> use acquisition counting. This allows you to pass around cdef objects
>> without the GIL and use their nogil methods. If the acquisition count
>> is greater than 1, the acquisition count owns a reference to the
>> object. If it reaches 0 you discard your owned reference (you can
>> simply acquire the GIL if you don't have it) and when you increment
>> from zero you obtain it. Perhaps something like libatomic could be
>> used to efficiently implement this.
> Where would you store that count? In the object struct? That would
> increase the size of each instance.
>> The advantages are:
>> 1) allow users to pass around cdef typed objects in nogil mode
>> 2) allow cdef typed objects in as struct attributes or array elements
>> 3) make it easy to implement things like memoryviews (already done but
>> would have been a lot easier), cython.parallel.async/future objects,
>> cython.parallel.mutex objects and possibly other things in the future
> Would it really be easier? You can already call cdef methods in nogil
> mode, AFAIR.
>> We should then allow a syntax like
>> with mycdefobject:
>> ...
>> to lock the object in GIL or nogil mode (like java's 'synchronized').
>> For objects that already have __enter__ and __exit__ you could support
>> something like 'with cython.synchronized(mycdefobject): ...' instead.
>> Or perhaps you should always require cython.synchronized (or
>> cython.parallel.synchronized).
> The latter, I sure hope.
>> In addition to nogil methods a user may provide special cdef nogil
>> methods, i.e.
>> cdef int __len__(self) nogil:
>> ...
>> which would provide a Cython as well as a Python implementation for
>> the function (with automatic cpdef behaviour), so you could use it in
>> both contexts.
> That can already be done for final types, simply by adding cpdef
> behaviour to all special methods. That would also fix ticket #3, for
> example.
> Note that the DefNode refactoring is still pending, it would help here.
>> There are two options for assignment semantics to a struct attribute
>> or array element:
>> - decref the old value (this implies always initializing the
>> pointers to NULL first)
>> - don't decref the old value (the user has to manually use 'del')
>> I think 1) is more definitely consistent with how everything else works.
> Yes.
>> All of this functionality should also get a sane C API (to be provided
>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>> Every class using this functionality is a subclass of CythonObject
>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>> the user is subclassing something other than object we could allow the
>> user to specify custom __cython_(un)lock__ and
>> __cython_acquisition_count__ methods and fields.
>> Now, building on top of this functionality, Cython could provide
>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>> (as a start). These will by default not lock for operations to allow
>> e.g. one thread to iterate over the list and another thread to index
>> it without lock contention and other general overhead. If one thread
>> is somehow changing the size of the list, or writing to indices that
>> another thread is reading from/writing to, the results will of course
>> be undefined unless the user synchronizes on the object. So it would
>> be the user's responsibility. The acquisition counting itself will
>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>> it will lock).
>> It's probably best to not enable this functionality by default as it
>> would be more expensive to instantiate objects, but it could be
>> supported through a cdef class decorator and a general directive.
> It's well known that this would be expensive. One of the approaches that
> tried to get rid of the GIL in CPython introduced fine grained locking,
> and it turned out to be substantially slower, AFAIR by a factor of two.

I'd gladly take a factor two (or even four) slowdown of CPython code any 
day to get rid of the GIL :-). The thing is, sometimes one has 48 cores 
and consider a 10x speedup better than nothing...

Dag Sverre

More information about the cython-devel mailing list