[Python-Dev] PEP 554 v3 (new interpreters module)

Wed Oct 4 21:41:18 EDT 2017

On 4 October 2017 at 23:51, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> The problem relates to the fact that there aren't any memory barriers
>> around CPython's INCREF operations (they're implemented as an ordinary
>> C post-increment operation), so you can get the following scenario:
>>
>> * thread on CPU A has the sole reference (ob_refcnt=1)
>> * thread on CPU B acquires a new reference, but hasn't pushed the
>> updated ob_refcnt value back to the shared memory cache yet
>> * original thread on CPU A drops its reference, *thinks* the refcnt is
>> now zero, and deletes the object
>> * bad things now happen in CPU B as the thread running there tries to
>> use a deleted object :)
>
> I'm not clear on where we'd run into this problem with channels.
> Mirroring your scenario:
>
> * interpreter A (in thread on CPU A) INCREFs the object (the GIL is still held)
> * interp A sends the object to the channel
> * interp B (in thread on CPU B) receives the object from the channel
> * the new reference is held until interp B DECREFs the object
>
> From what I see, at no point do we get a refcount of 0, such that
> there would be a race on the object being deleted.

Having the sending interpreter do the INCREF just changes the problem
to be a memory leak waiting to happen rather than an access-after-free
issue, since the problematic non-synchronised scenario then becomes:

* thread on CPU A has two references (ob_refcnt=2)
* it sends a reference to a thread on CPU B via a channel
* thread on CPU A releases its reference (ob_refcnt=1)
* updated ob_refcnt value hasn't made it back to the shared memory cache yet
* thread on CPU B releases its reference (ob_refcnt=1)
* both threads have released their reference, but the refcnt is still
1 -> object leaks!

We simply can't have INCREFs and DECREFs happening in different
threads without some way of ensuring cache coherency for *both*
operations - otherwise we risk either the refcount going to zero when
it shouldn't, or *not* going to zero when it should.

The current CPython implementation relies on the process global GIL
for that purpose, so none of these problems will show up until you
start trying to replace that with per-interpreter locks.

Free threaded reference counting relies on (expensive) atomic
increments & decrements.

The cross-interpreter view proposal aims to allow per-interpreter GILs
without introducing atomic increments & decrements by instead relying
on the view itself to ensure that it's holding the right GIL for the
object whose refcount it's manipulating, and the receiving interpreter
explicitly closing the view when it's done with it.

So while CIVs wouldn't be as easy to use as regular object references:

1. They'd be no harder to use than memoryviews in general
2. They'd structurally ensure that regular object refcounts can still
rely on "protected by the GIL" semantics
3. They'd structurally ensure zero performance degradation for regular
object refcounts
4. By virtue of being memoryview based, they'd encourage the adoption
of interfaces and practices that can be adapted to multiple processes
through the use of techniques like shared memory regions and memory
mapped files (see
http://www.boost.org/doc/libs/1_54_0/doc/html/interprocess/sharedmemorybetweenprocesses.html
for some detailed explanations of how that works, and
https://arrow.apache.org/ for an example of ways tools like Pandas can
use that to enable zero-copy data sharing)

> The only problem I'm aware of (it dawned on me last night), is in the
> case that the interpreter that created the object gets deleted before
> the object does.  In that case we can't pass the deletion back to the
> original interpreter.  (I don't think this problem is necessarily
> exclusive to the solution I've proposed for Bytes.)

The cross-interpreter-view idea proposes to deal with that by having
the CIV hold a strong reference not only to the sending object (which
is already part of the regular memoryview semantics), but *also* to
the sending interpreter - that way, neither the sending object nor the
sending interpreter can go away until the receiving interpreter closes
the view.

The refcount-integrity-ensuring sequence of events becomes:

1. Sending interpreter submits the object to the channel
2. Channel creates a CIV with references to the sending interpreter &
sending object, and a view on the sending object's memory
3. Receiving interpreter gets the CIV from the channel
4. Receiving interpreter closes the CIV either explicitly or via
__del__ (the latter would emit ResourceWarning)
5. CIV switches execution back to the sending interpreter and releases
both the memory buffer and the reference to the sending object
6. CIV switches execution back to the receiving interpreter, and
releases its reference to the sending interpreter
7. Execution continues in the receiving interpreter

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia