[Numpy-discussion] Releasing the GIL in ufuncs dealing with object arrays

Wed Aug 21 12:42:14 EDT 2019

On Tue, 2019-08-20 at 22:30 +0200, Joris Van den Bossche wrote:
> Hi Sebastian,
> 
> Thanks for the answer!
> 
> On Mon, 19 Aug 2019 at 17:57, Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > ...
> > 
> > Hmmm, interesting use case. No, I do not think there currently is a
> > reasonable way to do this (I think there may be ways to hack it).
> > Even
> > when all access to the objects is safe by itself, you still have
> > the
> > problem that the object stored inside the array could be replaced
> > (and
> > invalidated) at any time if you run multithreaded.
> 
> Would it help to have a custom dtype that ensures that all objects in
> the array are of this specific extension type? (I don't know if a
> custom dtype (done in C, like the quaternion examples) are possible
> for storing python objects)
>  

You can do a custom dtype like the quaternion. We are working on
creating new custom dtypes, but that will be a while until it lands.
That is one thing I am not quite sure about, whether it is possible to
do an object backed dtype currently (the issue is whether the reference
counting is done -- especially without adding other issues), I could
have a look if you like.

Making that easy is very high up on the "what I want in the future"
list.

> > We would like to type such objects in the future, even then, I am
> > not
> > sure how to make things safe against race conditions if elements
> > are
> > replaced (and deleted).
> > 
> > This is an interesting use case, since arrays of pointers (or
> > specific
> > pyobjects) will always have this type of issue, and I am not sure
> > how
> > you would avoid it (a cheap lock on the object itself works
> > probably,
> > but even if it is cheap, it is probably fairly expensive?).
> 
> Currently, we are thinking of doing two loops in the ufunc. First one
> for getting all the pointers into a C array, and then manually
> releasing the gil for the second loop doing the actual operation on
> the array of pointers. See 
> https://github.com/caspervdw/pygeos/issues/7 for example code. From a
> quick experiment that seems to give only a small overhead (in a
> single threaded case).

I suppose that should work. If you are within a inner loop, you have
only limited control on the chunking/buffersize though, so in the worst
case you might be releasing the GIL very often.
I suppose in the event that the array is not writeable, you could
actually release the GIL.

This is something that we are thinking about enabling full control
over, although it is not on the high list for priorities right now
(Basically my plan/thought is to start off without allowing such
things, but keeping it open for later addition).

In practice I suppose that such objects (and ufuncs) can be fairly
heavy, so that even indiscriminately copying the full input arrays
really is not a big issue as such.

> 
> That of course still has the same problems as you mentioned (although
> in our case, we are, in principle, the holders of the array and know
> what it contains, and the individual extension type objects are not
> mutable), but then at least it is our own responsibility of making
> sure that the array contains the correct objects and is not mutated.
> 

I understood it as: You copy+incref. After that all seems OK with me,
unless your object itself is mutable (in a non-threadsafe way).

Best,

Sebastian

> Joris
>  
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190821/25848af4/attachment.sig>