[Cython] Acquisition counted cdef classes
mark florisson
markflorisson88 at gmail.com
Tue Oct 25 18:58:39 CEST 2011
On 25 October 2011 12:22, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 25.10.2011 11:11:
>>
>> On 25 October 2011 08:33, Stefan Behnel wrote:
>>>
>>> mark florisson, 24.10.2011 21:50:
>>>>
>>>> This is in response to
>>>>
>>>>
>>>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>>>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>>>> previous discussion on cython.parallel.
>>>>
>>>> Basically I think we should have something more powerful than 'cdef
>>>> borrowed CdefClass obj', something that also doesn't rely on new
>>>> syntax.
>>>
>>> We will still need borrowed reference support in the compiler eventually,
>>> whether we make it a language feature or not.
>>
>> I'm not sure I understand why, acquisition counting could solve these
>> problems for cdef classes, and general objects may not be used without
>> the GIL. Do you want this as an optimization?
>
> Yes. Think of type(x), for example, or PyDict_GetItem(). They return
> borrowed references, and in many cases, Cython wouldn't have to INCREF and
> DECREF them when they are only being used as part of some specific kinds of
> expressions. The same applies to some utility functions in Cython that
> currently must INCREF their return value unconditionally, simply because
> they can't tell Cython that they could also return a borrowed reference
> instead. If there was a way to do that, we could optimise the reference
> counting away in a couple of more places, which would get us another bit
> closer to hand-tuned code.
>
> However, note that this doesn't necessarily have an impact on nogil code. If
> you took a borrowed reference in one nogil thread, and a gil-holding thread
> deletes the object at the same time or during the lifetime of the borrowed
> reference (e.g. by updating a dict or assigning to a cdef attribute), the
> nogil thread would end up with a dead pointer in its hands. That's why the
> usage of borrowed references needs to be explicit in the code ("I know what
> I'm doing"), and the optimisations require the GIL to be held.
>
I see, ok. Thanks, that really helped me see the motivation behind it
(i.e., the INC/DECREF really is a performance issue for you).
>>>> What if we support acquisition counting for every instance of a cdef
>>>> class? In Python and Cython GIL mode you use reference counting, and
>>>> in Cython nogil mode and for structs attributes, array dtypes etc you
>>>> use acquisition counting. This allows you to pass around cdef objects
>>>> without the GIL and use their nogil methods. If the acquisition count
>>>> is greater than 1, the acquisition count owns a reference to the
>>>> object. If it reaches 0 you discard your owned reference (you can
>>>> simply acquire the GIL if you don't have it) and when you increment
>>>> from zero you obtain it. Perhaps something like libatomic could be
>>>> used to efficiently implement this.
>>>
>>> Where would you store that count? In the object struct? That would
>>> increase
>>> the size of each instance.
>>
>> Yes, not just the count, also the lock. This feature would be optional
>> and may be very useful for people (I think).
>
> Well, as long as it's an optional feature that requires a class decorator,
> the only obvious drawback is that it'll bloat the compiler even more than it
> is already.
>
Actually, I think it will help the implementation of mutexes and async
objects if we want those, and possibly other stuff in the future. The
acquisition counting is basically already there (for memoryviews), so
it's easy to track down where and when to apply this. However one
major problem would be circular acquisition counts, so you'd also have
to implement a garbage collector like CPython has (e.g. if you have a
cdef class with a cython.parallel.dict). We should just have a real
garbage collector instead of all the counting crap. Or we could make
it a burden for the user...
I agree that this is really not as feasible as I first thought. It
actually shows me a problem where I can have a memoryview object in a
memoryview with dtype 'object', although the problem here is that the
memoryview object doesn't traverse the object in the Py_buffer, or
when coerced from a memoryview slice to a memoryview object, the
memoryview slice struct object... I suppose I need to fix that (but
I'm not sure how, as you can't provide a manual traverse function in
Cython).
But I really believe that these are much-wanted features. If you're
using threads in Python you can only get concurrency not parallelism,
unless you release the GIL, even if there is some performance overhead
it will still be a lot better than sequential execution. Perhaps when
cython.parallel will be more mature, we may get functionality to
specify data distribution schemes and message passing, in which case
the GIL won't be a problem. But many things would be harder or much
more expensive, e.g. transposing, sending objects etc.
I think I'll just drop this discussion for now. I'm going to look at
how garbage collection works, how pypy works and their GIL, and figure
out what I want.
>>>> The advantages are:
>>>>
>>>> 1) allow users to pass around cdef typed objects in nogil mode
>>>> 2) allow cdef typed objects in as struct attributes or array elements
>>>> 3) make it easy to implement things like memoryviews (already done but
>>>> would have been a lot easier), cython.parallel.async/future objects,
>>>> cython.parallel.mutex objects and possibly other things in the future
>>>
>>> Would it really be easier? You can already call cdef methods in nogil
>>> mode,
>>> AFAIR.
>>
>> Sure, but you cannot store cdef objects as struct attributes, array
>> elements (you could implement it with reference counting, but not for
>> nogil mode)
>
> You could do that with borrowed references, though, assuming that you keep
> another reference around (or do your own ref-counting). However, I do see
> that keeping a real reference around may be hard to do in some cases.
>
>
>> and you cannot pass them around without the GIL.
>
> Yes, you can, as long as you only go through cdef functions. Obviously, you
> can't pass them into a Python function call, but you can (and could, if it
> was implemented) do loads of useful things with existing references even in
> nogil sections. The GIL checker is quite fine grained already but could do
> even better.
>
Ok, so cdef arguments are borrowed, which gets you somewhere but not
very far. It's rather baffling that f(x) is fine in nogil mode, but y
= x isn't.
>> This
>> proposal is about making your life easier without the GIL, and
>> currently it's kind of a pain.
>
> The nogil sections I use are usually quite short, so I can't tell. It's
> certainly a pain to work without the GIL, because it means you have to take
> a lot more care when writing your code. But that won't change just by
> dropping reference counting. And nogil code will definitely become another
> bit harder to get right when using borrowed references.
>
>
>> Ah I assumed cpdef nogil was invalid, I see it isn't, cool.
>
> It makes perfect sense. Just because a function *can* be called without the
> GIL doesn't mean it can't be called from Python. So the Python wrapper
> requires the GIL, but the underlying cdef function doesn't.
>
>
>> This breaks terribly for special methods though.
>
> Why? It's just a matter of properly separating out their Python wrapper.
> That's why I was referring to the DefNode refactoring.
>
I see, ok. All I meant was that it currently gives you compile errors.
>>>> All of this functionality should also get a sane C API (to be provided
>>>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>>>> Every class using this functionality is a subclass of CythonObject
>>>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>>>> the user is subclassing something other than object we could allow the
>>>> user to specify custom __cython_(un)lock__ and
>>>> __cython_acquisition_count__ methods and fields.
>>>>
>>>> Now, building on top of this functionality, Cython could provide
>>>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>>>> (as a start). These will by default not lock for operations to allow
>>>> e.g. one thread to iterate over the list and another thread to index
>>>> it without lock contention and other general overhead. If one thread
>>>> is somehow changing the size of the list, or writing to indices that
>>>> another thread is reading from/writing to, the results will of course
>>>> be undefined unless the user synchronizes on the object. So it would
>>>> be the user's responsibility. The acquisition counting itself will
>>>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>>>> it will lock).
>>>>
>>>> It's probably best to not enable this functionality by default as it
>>>> would be more expensive to instantiate objects, but it could be
>>>> supported through a cdef class decorator and a general directive.
>>>
>>> It's well known that this would be expensive. One of the approaches that
>>> tried to get rid of the GIL in CPython introduced fine grained locking,
>>> and
>>> it turned out to be substantially slower, AFAIR by a factor of two.
>>
>> Sure, I am aware of that. Often you can just keep the GIL, in which
>> case you wouldn't use these types. But when you want to leave the
>> shiny world of the GIL you still want these goodies. Acquiring the GIL
>> is too expensive as there is pretty much always contention.
>
> Acquiring a more fine grained lock is more likely to reduce the contention,
> but is not necessarily less expensive. The lock still needs to get acquired
> and released. GIL protected reference counting is a lot cheaper than that,
> as is manual locking in a more coarse grained fashion.
>
Well, many processors support atomic incrementing and decrementing
counters + checking whether the counter has reached zero. So for most
architectures you wouldn't need to lock for the counting (unless you
reach a count of zero and you're going to decref your object). Any
operation would lock though, which would indeed be expensive.
>>> You could potentially drop the locking for local variables, but you'd
>>> loose
>>> that ability as soon as the 'object' is passed into a function.
>>
>> Definitely, but you cannot use them with the GIL anyway :)
>
> Yes you can. For cdef functions, it's the responsibility of the caller to
> own the references of object arguments it passes. The called function
> doesn't have to do reference counting for them, as long as it doesn't try to
> reassign the variable. And even that could be fixed with borrowed
> references, and also partly by better control flow analysis.
>
Sorry, with "use" I mean "actually do something", like call a method,
lookup an attribute, coerce it, etc.
>>> Basically, what you are trying to do here is to duplicate the complete
>>> ref-counting infrastructure of CPython, but without using CPython.
>>>
>>>> Of course one may still use non-cdef borrowed objects, by simply
>>>> casting to a PyObject *.
>>>
>>> That's very ugly, though, because you loose all access to methods and
>>> attributes of the object. Basically, it becomes useless that way, except
>>> for
>>> storing away a pointer to it somewhere. You could just as well use a
>>> void*.
>>
>> Indeed, and that's really all you can do without the GIL.
>
> I think you're underestimating what can (or could) be done without holding
> the GIL. There are still some open features that wait for being implemented,
> even without adding new syntax (and thus further increasing the complexity
> of the language).
>
Yeah borrowed references definitely somewhere. It's just that for
supporting the parallel types that wouldn't be good enough.
>> I think
>> we're talking about different things, I'm talking about supporting
>> nogil, and you're talking about borrowed references in general.
>
> Both are related, though. It's certainly a lot easier and cleaner to support
> borrowed references in the compiler, than to implement a whole new scheme
> for handling extension type instances in addition to the normal object
> handling which we need anyway.
>
>
>> I'm
>> not sure why you'd not just take a reference instead in GIL mode,
>> unless you were worried about incrementing a counter.
>
> Decrementing it, not incrementing. :)
>
> The problem is not so much the INCREF (which is just an indirect add), it's
> the DECREF, which contains a conditional jump based on an unknown external
> value, that may trigger external code. That can kill several C compiler
> optimisations for the surrounding code. (And that would only get worse by
> using a dedicated locking mechanism.)
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
Anyway, sorry for the long mail. I agree this is likely not feasible
to implement, although I would like the functionality to be there.
Perhaps I'm trying to solve problems which don't really need to be
solved. Maybe we should just use multiprocessing, or MPI and numpy
with global arrays and pickling. Maybe memoryviews could help out with
that as well.
More information about the cython-devel
mailing list