[Cython] Acquisition counted cdef classes

mark florisson markflorisson88 at gmail.com
Tue Oct 25 11:11:47 CEST 2011


On 25 October 2011 08:33, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 24.10.2011 21:50:
>>
>> This is in response to
>>
>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>> previous discussion on cython.parallel.
>>
>> Basically I think we should have something more powerful than 'cdef
>> borrowed CdefClass obj', something that also doesn't rely on new
>> syntax.
>
> We will still need borrowed reference support in the compiler eventually,
> whether we make it a language feature or not.
>

I'm not sure I understand why, acquisition counting could solve these
problems for cdef classes, and general objects may not be used without
the GIL. Do you want this as an optimization?

>> What if we support acquisition counting for every instance of a cdef
>> class? In Python and Cython GIL mode you use reference counting, and
>> in Cython nogil mode and for structs attributes, array dtypes etc you
>> use acquisition counting. This allows you to pass around cdef objects
>> without the GIL and use their nogil methods. If the acquisition count
>> is greater than 1, the acquisition count owns a reference to the
>> object. If it reaches 0 you discard your owned reference (you can
>> simply acquire the GIL if you don't have it) and when you increment
>> from zero you obtain it. Perhaps something like libatomic could be
>> used to efficiently implement this.
>
> Where would you store that count? In the object struct? That would increase
> the size of each instance.

Yes, not just the count, also the lock. This feature would be optional
and may be very useful for people (I think).

>
>> The advantages are:
>>
>> 1) allow users to pass around cdef typed objects in nogil mode
>> 2) allow cdef typed objects in as struct attributes or array elements
>> 3) make it easy to implement things like memoryviews (already done but
>> would have been a lot easier), cython.parallel.async/future objects,
>> cython.parallel.mutex objects and possibly other things in the future
>
> Would it really be easier? You can already call cdef methods in nogil mode,
> AFAIR.
>

Sure, but you cannot store cdef objects as struct attributes, array
elements (you could implement it with reference counting, but not for
nogil mode), and you cannot pass them around without the GIL. This
proposal is about making your life easier without the GIL, and
currently it's kind of a pain.

>> We should then allow a syntax like
>>
>>     with mycdefobject:
>>         ...
>>
>> to lock the object in GIL or nogil mode (like java's 'synchronized').
>> For objects that already have __enter__ and __exit__ you could support
>> something like 'with cython.synchronized(mycdefobject): ...' instead.
>> Or perhaps you should always require cython.synchronized (or
>> cython.parallel.synchronized).
>
> The latter, I sure hope.
>
>
>> In addition to nogil methods a user may provide special cdef nogil
>> methods, i.e.
>>
>> cdef int __len__(self) nogil:
>>     ...
>>
>> which would provide a Cython as well as a Python implementation for
>> the function (with automatic cpdef behaviour), so you could use it in
>> both contexts.
>
> That can already be done for final types, simply by adding cpdef behaviour
> to all special methods. That would also fix ticket #3, for example.
>
> Note that the DefNode refactoring is still pending, it would help here.
>

Ah I assumed cpdef nogil was invalid, I see it isn't, cool. This
breaks terribly for special methods though.

>> There are two options for assignment semantics to a struct attribute
>> or array element:
>>     - decref the old value (this implies always initializing the
>> pointers to NULL first)
>>     - don't decref the old value (the user has to manually use 'del')
>>
>> I think 1) is more definitely consistent with how everything else works.
>
> Yes.
>
>
>> All of this functionality should also get a sane C API (to be provided
>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>> Every class using this functionality is a subclass of CythonObject
>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>> the user is subclassing something other than object we could allow the
>> user to specify custom __cython_(un)lock__ and
>> __cython_acquisition_count__ methods and fields.
>>
>> Now, building on top of this functionality, Cython could provide
>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>> (as a start). These will by default not lock for operations to allow
>> e.g. one thread to iterate over the list and another thread to index
>> it without lock contention and other general overhead. If one thread
>> is somehow changing the size of the list, or writing to indices that
>> another thread is reading from/writing to, the results will of course
>> be undefined unless the user synchronizes on the object. So it would
>> be the user's responsibility. The acquisition counting itself will
>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>> it will lock).
>>
>> It's probably best to not enable this functionality by default as it
>> would be more expensive to instantiate objects, but it could be
>> supported through a cdef class decorator and a general directive.
>
> It's well known that this would be expensive. One of the approaches that
> tried to get rid of the GIL in CPython introduced fine grained locking, and
> it turned out to be substantially slower, AFAIR by a factor of two.

Sure, I am aware of that. Often you can just keep the GIL, in which
case you wouldn't use these types. But when you want to leave the
shiny world of the GIL you still want these goodies. Acquiring the GIL
is too expensive as there is pretty much always contention.

> You could potentially drop the locking for local variables, but you'd loose
> that ability as soon as the 'object' is passed into a function.

Definitely, but you cannot use them with the GIL anyway :)

> Basically, what you are trying to do here is to duplicate the complete
> ref-counting infrastructure of CPython, but without using CPython.
>
>
>> Of course one may still use non-cdef borrowed objects, by simply
>> casting to a PyObject *.
>
> That's very ugly, though, because you loose all access to methods and
> attributes of the object. Basically, it becomes useless that way, except for
> storing away a pointer to it somewhere. You could just as well use a void*.

Indeed, and that's really all you can do without the GIL. I think
we're talking about different things, I'm talking about supporting
nogil, and you're talking about borrowed references in general. I'm
not sure why you'd not just take a reference instead in GIL mode,
unless you were worried about incrementing a counter.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>


More information about the cython-devel mailing list