[Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.
Eric Snow
ericsnowcurrently at gmail.com
Wed Jul 18 15:35:15 EDT 2018
On Wed, Jul 18, 2018 at 12:49 PM Stephan Houben <stephanh42 at gmail.com> wrote:
> Antoine said that what I proposed earlier was very similar to what Eric
> is trying to do, but from the direction the discussion has taken so far
> that appears not to be the case.
It looks like we are after the same thing actually. :) Sorry for any confusion.
There are currently no provisions for actually sharing objects between
interpreters. In fact, initially the plan is basically to support
sharing copies of basic builtin immuntable types. The question of
refcounts comes in when we actually do share underlying data of
immutable objects (e.g. the buffer protocol).
> I will therefore try to clarify my proposal.
>
> Basically, what I am suggesting is a direct translation of Javascript's
> Web Worker API (https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API)
> to Python.
>
> The Web Worker API is generally considered a "share-nothing" approach, although
> as we will see some state can be shared.
Yes, there's a strong parallel to that model here. In fact, I
mentioned web workers in my language summit talk at PyCon 2018.
> The basic principle is that any object lives in a single Worker (Worker = subinterpreter).
> If a message is send from Worker A to Worker B, the message is not shared,
> rather the so-called "structured clone" algorithm is used to create recursively a NEW message
> object in Worker B. This is roughly equivalent to pickling in A and then unpickling in B,
That is exactly what the channels in the PEP 554 implementation do,
though much more efficiently than pickling. Initial support will be
for basic builtin immutable types. We can later consider support for
other (even arbitrary?) types, but anything beyond copying (e.g.
pickle) is way off my radar. Python's C-API is so closely tied to
refcounting that we simply cannot support safely sharing actual Python
objects between interpreters once we no longer share the GIL between
them.
> Of course, this may become a bottleneck if large amounts of data need to be communicated.
> Therefore, there is a special object type designed to provide a view upon a piece
> of shared memory: SharedArrayBuffer. Notable, this only provides a view upon
> raw "C"-style data (ints or floats or whatever), not on Javascript objects.
Yep, that translates to buffers in Python, which is covered by PEP 554
(see SendChannel.send_buffer).
In this case, where some underlying data is actually shared, the
implementation has to deal with keeping a reference to the original
object and releasing it when done, which is what all the talk of
refcounts has been about. However, the PEP does not talk about it
because it is an implementation detail that is not exposed in Python.
> To translate this to the Python situation: each Python object is owned by a single
> subinterpreter, and may only be manipulated by a thread which holds the GIL
> of that particular subinterpreter. Message sending between subinterpreters will
> require the message objects to be "structured cloned".
Correct. That is what PEP 554 does.
As an aside, your phrasing "may only be manipulated by a thread which
holds the GIL of that particular subinterpreter" did spark something
I'll consider later: perhaps interpreters can acquire each other's
GIL when (infrequently) necessary. That could simplify a few things.
> Certain C extension types may override what structured cloning means for them.
> In particular, some C extension types may have a two-layer structure where
> the Py_Object contains a refcounted pointer to the actual data.
> The structured cloning on such an object may create a second Py_Object which
> references the same underlying object.
> This secondary refcount will need to be properly atomic, since it may be manipulated
> from multiple subinterpreters.
My implementation of PEP 554 supports this, though I have not made the
C-API for it public. It's also not part of the PEP. I was
considering adding it.
> In this way, interpreter-shared data structures can be implemented.
> However, all the "normal" Python objects are not shared and can continue
> to use the current, non-atomic refcounting implementation.
That is correct. That entirely matches what I'm doing with PEP 554.
In fact, the isolation between interpreters is critical to my
multi-core Python project, of which PEP 554 is a part. It's necessary
in order to stop sharing the GIL between interpreters. So actual
objects will never be shared between interpreters. They can't be.
> Hope this clarifies my proposal.
Yep. Thanks!
-eric
More information about the Python-ideas
mailing list