[Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

Wed Sep 13 00:10:12 EDT 2017

On Tue, Sep 12, 2017 at 1:46 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>> My concern is that this is a chicken-and-egg problem.  The situation
>>> won't improve until subinterpreters are more readily available.
>>
>> Okay, but you're assuming that "more libraries work well with
>> subinterpreters" is in fact an improvement. I'm asking you to convince
>> me of that :-). Are there people saying "oh, if only subinterpreters
>> had a Python API and less weird interactions with C extensions, I
>> could do <something awesome>"? So far they haven't exactly taken the
>> world by storm...
>
> The problem is that most people don't know about the feature.  And
> even if they do, using it requires writing a C-extension, which most
> people aren't comfortable doing.
>
>>> Other than C globals, is there some other issue?
>>
>> That's the main one I'm aware of, yeah, though I haven't looked into it closely.
>
> Oh, good.  I haven't missed something. :)  Do you know how often
> subinterpreter support is a problem for users?  I was under the
> impression from your earlier statements that this is a recurring issue
> but my understanding from mod_wsgi is that it isn't that common.

It looks like we've been averaging one bug report every ~6 months for
the last 3 years:
    https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue%20subinterpreter%20OR%20subinterpreters

They mostly come from Jep, not mod_wsgi. (Possibly because Jep has
some built-in numpy integration.) I don't know how many people file
bugs versus just living with it or finding some workaround. I suspect
for mod_wsgi in particular they probably switch to something else --
it's not like there's any shortage of WSGI servers that avoid these
problems. And for Jep there are prominent warnings to expect problems
and suggesting workarounds:
  https://github.com/ninia/jep/wiki/Workarounds-for-CPython-Extensions

>> I guess I would be much more confident in the possibilities here if
>> you could give:
>>
>> - some hand-wavy sketch for how subinterpreter A could call a function
>> that as originally defined in subinterpreter B without the GIL, which
>> seems like a precondition for sharing user-defined classes
>
> (Before I respond, note that this is way outside the scope of the PEP.
> The merit of subinterpreters extends beyond any benefits of running
> sans-GIL, though that is my main goal.  I've been updating the PEP to
> (hopefully) better communicate the utility of subinterpreters.)

Subinterpreters are basically an attempt to reimplement the OS's
process isolation in user-space, right? Classic trade-off where we
accept added complexity and fragility in the hopes of gaining some
speed? I just looked at the PEP again, and I'm afraid I still don't
understand what the benefits are unless we can remove the GIL and
somehow get a speedup over processes. Implementing CSP is a neat idea,
but you could do it with subprocesses too. AFAICT you could implement
the whole subinterpreters module API with subprocesses on 3.6, and
it'd be multi-core and have perfect extension module support.

> Code objects are immutable so that part should be relatively
> straight-forward.  There's the question of closures and default
> arguments that would have to be resolved.  However, those are things
> that would need to be supported anyway in a world where we want to
> pass functions and user-defined types between interpreters.  Doing so
> will be a gradual process of starting with immutable non-container
> builtin types and expanding out from there to other immutable types,
> including user-defined ones.

I tried arguing that code objects were immutable to the PyPy devs too
:-). The problem is that to call a function you need both its
__code__, which is immutable, and its __globals__, which is
emphatically not. The __globals__ thing means that if you start from
an average function you can often follow pointers to reach every other
global object (e.g. if the function uses regular expressions, you can
probably reach any module by doing
func.__globals__["re"].sys.modules[...]). You might hope that you
could somehow restrict this, but I can't think of any way that's
really useful :-(.

>
> Note that sharing mutable objects between interpreters would be a
> pretty advanced usage (i.e. opt-in shared state vs. threading's
> share-everything).  If it proves desirable then we'd sort that out
> then.  However, I don't see that as a more than an esoteric feature
> relative to subinterpreters.
>
> In my mind, the key advantage of being able to share more (immutable)
> objects, including user-defined types, between interpreters is in the
> optimization opportunities.

But even if we can add new language features for "freezing"
user-defined objects, then their .__class__ will still be mutable,
their methods will still have mutable .__globals__, etc. And even if
we somehow made it so their methods only got a read-only view on the
original interpreter's state, that still wouldn't protect against race
conditions and memory corruption, because the original interpreter
might be mutating things while the subinterpreter is looking at them.

> It would allow us to avoid instantiating
> the same object in each interpreter.  That said, the way I imagine it
> I wouldn't consider such an optimization to be very user-facing so it
> doesn't impact the PEP.  The user-facing part would be the expanded
> set of immutable objects interpreters could pass back and forth, and
> expanding that set won't require any changes to the API in the PEP.
>
>> - some hand-wavy sketch for how refcounting will work for objects
>> shared between multiple subinterpreters without the GIL, without
>> majorly impacting single-thread performance (I actually forgot about
>> this problem in my last email, because PyPy has already solved this
>> part!)
>
> (same caveat as above)
>
> There are a number of approaches that may work.  One is to give each
> interpreter its own allocator and GC.

This makes sense to me if the subinterpreters aren't going to share
any references, but obviously that wasn't the question :-). And I
don't see how this makes it easier to work with references that cross
between different GC domains. If anything it seems like it would make
things harder.

> Another is to mark shared
> objects such that they never get GC'ed.

I don't think just leaking everything all the time is viable :-(. And
even so this requires traversing the whole object reference graph on
every communication operation, which defeats the purpose; the whole
idea here was to find something that doesn't have to walk the object
graph, because that's what makes pickle slow.

> Another is to allow objects
> to exist only in one interpreter at a time.

Yeah, like rust -- very neat if we can do it! If I want you to give
you this object, I can't have it anymore myself. But... I haven't been
able to think of any way we could actually enforce this efficiently.
When passing an object between subinterpreters you could require that
the root object has refcount 1, and then do like a little
mini-mark-and-sweep to make sure all the objects reachable from it
only have references that are within the local object graph. But then
we're traversing the object graph again.

This would also require new syntax, because you need something like a
simultaneous send-and-del, and you can't fake a del with a function
call.

(It also adds another extension module incompatibility, because it
would require every extension type to implement tp_traverse --
previously this was only mandatory for objects that could indirectly
reference themselves. But maybe the extension type thing doesn't
matter because this would have to be restricted to builtin immutable
types anyway, as per above.)

> Similarly, object ownership (per interpreter) could help.

How?

> Asynchronous refcounting could be an option.

Right, like in the GILectomy branch. This is the only potentially
viable solution I can think of (short of dropping refcounting
altogether, which I think is how most multi-core languages solve this,
and why PyPy has a head start on GIL removal). Do we know yet how much
single-thread overhead this adds?

It makes me nervous too -- a lot of the attraction of subinterpreters
is that the minimal shared state is supposed to make GIL removal
easier and less risky than if we were attempting a full GILectomy. But
in this case I don't see how to exploit their isolation at all.

> That's only some of the possible approaches.  I
> expect that at least one of them will be suitable.

The reason I'm pushing on this is exactly because I don't expect that;
I think it's very likely that you'll spend a bunch of time on the fun
easier parts, and then discover that actually the hard parts are
impossible. If you want me to shut up and leave you to it, say the
word :-).

> However, the first
> step is to get the multi-interpreter support out there.  Then we can
> tackle the problem of optimization and multi-core utilization.
>
> FWIW, the biggest complexity is actually in synchronizing the sharing
> strategy across the inter-interpreter boundary (e.g. FIFO).  We should
> expect the relative time spent passing objects between interpreters to
> be very small.  So not only does that provide us will a good target
> for our refcount resolving strategy, we can afford some performance
> wiggle room in that solution.  (again, we're looking way ahead here)

But if the only advantage of subinterpreters over subprocesses is that
the communication costs are lower, then surely you should be targeting
cases where the communication costs are high?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org