On Tue, Oct 3, 2017 at 8:55 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think we need a sharing protocol, not just a flag. We also need to think carefully about that protocol, so that it does not imply unnecessary memory copies. Therefore I think the protocol should be something like the buffer protocol, that allows to acquire and release a set of shared memory areas, but without imposing any semantics onto those memory areas (each type implementing its own semantics). And there needs to be a dedicated reference counting for object shares, so that the original object can be notified when all its shares have vanished.
I've come to agree. :) I actually came to the same conclusion tonight before I'd been able to read through your message carefully. My idea is below. Your suggestion about protecting shared memory areas is something to discuss further, though I'm not sure it's strictly necessary yet (before we stop sharing the GIL). On Wed, Oct 4, 2017 at 7:41 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Having the sending interpreter do the INCREF just changes the problem to be a memory leak waiting to happen rather than an access-after-free issue, since the problematic non-synchronised scenario then becomes:
* thread on CPU A has two references (ob_refcnt=2) * it sends a reference to a thread on CPU B via a channel * thread on CPU A releases its reference (ob_refcnt=1) * updated ob_refcnt value hasn't made it back to the shared memory cache yet * thread on CPU B releases its reference (ob_refcnt=1) * both threads have released their reference, but the refcnt is still 1 -> object leaks!
We simply can't have INCREFs and DECREFs happening in different threads without some way of ensuring cache coherency for *both* operations - otherwise we risk either the refcount going to zero when it shouldn't, or *not* going to zero when it should.
The current CPython implementation relies on the process global GIL for that purpose, so none of these problems will show up until you start trying to replace that with per-interpreter locks.
Free threaded reference counting relies on (expensive) atomic increments & decrements.
Right. I'm not sure why I was missing that, but I'm clear now. Below is a rough idea of what I think may work instead (the result of much tossing and turning in bed*). While we're still sharing a GIL between interpreters: Channel.send(obj): # in interp A incref(obj) if type(obj).tp_share == NULL: raise ValueError("not a shareable type") ch.objects.append(obj) Channel.recv(): # in interp B orig = ch.objects.pop(0) obj = orig.tp_share() return obj bytes.tp_share(): return self After we move to not sharing the GIL between interpreters: Channel.send(obj): # in interp A incref(obj) if type(obj).tp_share == NULL: raise ValueError("not a shareable type") set_owner(obj) # obj.owner or add an obj -> interp entry to global table ch.objects.append(obj) Channel.recv(): # in interp B orig = ch.objects.pop(0) obj = orig.tp_share() set_shared(obj, orig) # add to a global table return obj bytes.tp_share(): obj = blank_bytes(len(self)) obj.ob_sval = self.ob_sval # hand-wavy memory sharing return obj bytes.tp_free(): # under no-shared-GIL: # most of this could be pulled into a macro for re-use orig = lookup_shared(self) if orig != NULL: current = release_LIL() interp = lookup_owner(orig) acquire_LIL(interp) decref(orig) release_LIL(interp) acquire_LIL(current) # clear shared/owner tables # clear/release self.ob_sval free(self) The CIV approach could be facilitated through something like a new SharedBuffer type, or through a separate BufferViewChannel, etc. Most notably, this approach avoids hard-coding specific type support into channels and should work out fine under no-shared-GIL subinterpreters. One nice thing about the tp_share slot is that it makes it much easier (along with C-API for managing the global owned/shared tables) to implement other types that are legal to pass through channels. Such could be provided via extension modules. Numpy arrays could be made to support it, if that's your thing. Antoine could give tp_share to locks and semaphores. :) Of course, any such types would have to ensure that they are actually safe to share between intepreters without a GIL between them... For PEP 554, I'd only propose the tp_share slot and its use in Channel.send()/.recv(). The parts related to global tables and memory sharing and tp_free() wouldn't be necessary until we stop sharing the GIL between interpreters. However, I believe that tp_share would make us ready for that. -eric * I should know by now that some ideas sound better in the middle of the night than they do the next day, but this idea is keeping me awake so I'll risk it! :)