[Python-Dev] PEP 554 v3 (new interpreters module)

Thu Oct 5 04:45:26 EDT 2017

On Tue, Oct 3, 2017 at 8:55 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> I think we need a sharing protocol, not just a flag.  We also need to
> think carefully about that protocol, so that it does not imply
> unnecessary memory copies.  Therefore I think the protocol should be
> something like the buffer protocol, that allows to acquire and release
> a set of shared memory areas, but without imposing any semantics onto
> those memory areas (each type implementing its own semantics).  And
> there needs to be a dedicated reference counting for object shares, so
> that the original object can be notified when all its shares have
> vanished.

I've come to agree. :)  I actually came to the same conclusion tonight
before I'd been able to read through your message carefully.  My idea
is below.  Your suggestion about protecting shared memory areas is
something to discuss further, though I'm not sure it's strictly
necessary yet (before we stop sharing the GIL).

On Wed, Oct 4, 2017 at 7:41 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Having the sending interpreter do the INCREF just changes the problem
> to be a memory leak waiting to happen rather than an access-after-free
> issue, since the problematic non-synchronised scenario then becomes:
>
> * thread on CPU A has two references (ob_refcnt=2)
> * it sends a reference to a thread on CPU B via a channel
> * thread on CPU A releases its reference (ob_refcnt=1)
> * updated ob_refcnt value hasn't made it back to the shared memory cache yet
> * thread on CPU B releases its reference (ob_refcnt=1)
> * both threads have released their reference, but the refcnt is still
> 1 -> object leaks!
>
> We simply can't have INCREFs and DECREFs happening in different
> threads without some way of ensuring cache coherency for *both*
> operations - otherwise we risk either the refcount going to zero when
> it shouldn't, or *not* going to zero when it should.
>
> The current CPython implementation relies on the process global GIL
> for that purpose, so none of these problems will show up until you
> start trying to replace that with per-interpreter locks.
>
> Free threaded reference counting relies on (expensive) atomic
> increments & decrements.

Right.  I'm not sure why I was missing that, but I'm clear now.

Below is a rough idea of what I think may work instead (the result of
much tossing and turning in bed*).

While we're still sharing a GIL between interpreters:

Channel.send(obj):  # in interp A
    incref(obj)
    if type(obj).tp_share == NULL:
        raise ValueError("not a shareable type")
    ch.objects.append(obj)

Channel.recv():  # in interp B
    orig = ch.objects.pop(0)
    obj = orig.tp_share()
    return obj

bytes.tp_share():
    return self

After we move to not sharing the GIL between interpreters:

Channel.send(obj):  # in interp A
    incref(obj)
    if type(obj).tp_share == NULL:
        raise ValueError("not a shareable type")
    set_owner(obj)  # obj.owner or add an obj -> interp entry to global table
    ch.objects.append(obj)

Channel.recv():  # in interp B
    orig = ch.objects.pop(0)
    obj = orig.tp_share()
    set_shared(obj, orig)  # add to a global table
    return obj

bytes.tp_share():
    obj = blank_bytes(len(self))
    obj.ob_sval = self.ob_sval # hand-wavy memory sharing
    return obj

bytes.tp_free():  # under no-shared-GIL:
    # most of this could be pulled into a macro for re-use
    orig = lookup_shared(self)
    if orig != NULL:
        current = release_LIL()
        interp = lookup_owner(orig)
        acquire_LIL(interp)
        decref(orig)
        release_LIL(interp)
        acquire_LIL(current)
        # clear shared/owner tables
        # clear/release self.ob_sval
    free(self)

The CIV approach could be facilitated through something like a new
SharedBuffer type, or through a separate BufferViewChannel, etc.

Most notably, this approach avoids hard-coding specific type support
into channels and should work out fine under no-shared-GIL
subinterpreters.  One nice thing about the tp_share slot is that it
makes it much easier (along with C-API for managing the global
owned/shared tables) to implement other types that are legal to pass
through channels.  Such could be provided via extension modules.
Numpy arrays could be made to support it, if that's your thing.
Antoine could give tp_share to locks and semaphores. :)  Of course,
any such types would have to ensure that they are actually safe to
share between intepreters without a GIL between them...

For PEP 554, I'd only propose the tp_share slot and its use in
Channel.send()/.recv().  The parts related to global tables and memory
sharing and tp_free() wouldn't be necessary until we stop sharing the
GIL between interpreters.  However, I believe that tp_share would make
us ready for that.

-eric

* I should know by now that some ideas sound better in the middle of
the night than they do the next day, but this idea is keeping me awake
so I'll risk it! :)