On Tue, Sep 12, 2017 at 1:46 PM, Eric Snow
On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith
On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow
My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available.
Okay, but you're assuming that "more libraries work well with subinterpreters" is in fact an improvement. I'm asking you to convince me of that :-). Are there people saying "oh, if only subinterpreters had a Python API and less weird interactions with C extensions, I could do <something awesome>"? So far they haven't exactly taken the world by storm...
The problem is that most people don't know about the feature. And even if they do, using it requires writing a C-extension, which most people aren't comfortable doing.
Other than C globals, is there some other issue?
That's the main one I'm aware of, yeah, though I haven't looked into it closely.
Oh, good. I haven't missed something. :) Do you know how often subinterpreter support is a problem for users? I was under the impression from your earlier statements that this is a recurring issue but my understanding from mod_wsgi is that it isn't that common.
It looks like we've been averaging one bug report every ~6 months for the last 3 years: https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue%20subinterpreter%20OR%20subinterpreters They mostly come from Jep, not mod_wsgi. (Possibly because Jep has some built-in numpy integration.) I don't know how many people file bugs versus just living with it or finding some workaround. I suspect for mod_wsgi in particular they probably switch to something else -- it's not like there's any shortage of WSGI servers that avoid these problems. And for Jep there are prominent warnings to expect problems and suggesting workarounds: https://github.com/ninia/jep/wiki/Workarounds-for-CPython-Extensions
I guess I would be much more confident in the possibilities here if you could give:
- some hand-wavy sketch for how subinterpreter A could call a function that as originally defined in subinterpreter B without the GIL, which seems like a precondition for sharing user-defined classes
(Before I respond, note that this is way outside the scope of the PEP. The merit of subinterpreters extends beyond any benefits of running sans-GIL, though that is my main goal. I've been updating the PEP to (hopefully) better communicate the utility of subinterpreters.)
Subinterpreters are basically an attempt to reimplement the OS's process isolation in user-space, right? Classic trade-off where we accept added complexity and fragility in the hopes of gaining some speed? I just looked at the PEP again, and I'm afraid I still don't understand what the benefits are unless we can remove the GIL and somehow get a speedup over processes. Implementing CSP is a neat idea, but you could do it with subprocesses too. AFAICT you could implement the whole subinterpreters module API with subprocesses on 3.6, and it'd be multi-core and have perfect extension module support.
Code objects are immutable so that part should be relatively straight-forward. There's the question of closures and default arguments that would have to be resolved. However, those are things that would need to be supported anyway in a world where we want to pass functions and user-defined types between interpreters. Doing so will be a gradual process of starting with immutable non-container builtin types and expanding out from there to other immutable types, including user-defined ones.
I tried arguing that code objects were immutable to the PyPy devs too :-). The problem is that to call a function you need both its __code__, which is immutable, and its __globals__, which is emphatically not. The __globals__ thing means that if you start from an average function you can often follow pointers to reach every other global object (e.g. if the function uses regular expressions, you can probably reach any module by doing func.__globals__["re"].sys.modules[...]). You might hope that you could somehow restrict this, but I can't think of any way that's really useful :-(.
Note that sharing mutable objects between interpreters would be a pretty advanced usage (i.e. opt-in shared state vs. threading's share-everything). If it proves desirable then we'd sort that out then. However, I don't see that as a more than an esoteric feature relative to subinterpreters.
In my mind, the key advantage of being able to share more (immutable) objects, including user-defined types, between interpreters is in the optimization opportunities.
But even if we can add new language features for "freezing" user-defined objects, then their .__class__ will still be mutable, their methods will still have mutable .__globals__, etc. And even if we somehow made it so their methods only got a read-only view on the original interpreter's state, that still wouldn't protect against race conditions and memory corruption, because the original interpreter might be mutating things while the subinterpreter is looking at them.
It would allow us to avoid instantiating the same object in each interpreter. That said, the way I imagine it I wouldn't consider such an optimization to be very user-facing so it doesn't impact the PEP. The user-facing part would be the expanded set of immutable objects interpreters could pass back and forth, and expanding that set won't require any changes to the API in the PEP.
- some hand-wavy sketch for how refcounting will work for objects shared between multiple subinterpreters without the GIL, without majorly impacting single-thread performance (I actually forgot about this problem in my last email, because PyPy has already solved this part!)
(same caveat as above)
There are a number of approaches that may work. One is to give each interpreter its own allocator and GC.
This makes sense to me if the subinterpreters aren't going to share any references, but obviously that wasn't the question :-). And I don't see how this makes it easier to work with references that cross between different GC domains. If anything it seems like it would make things harder.
Another is to mark shared objects such that they never get GC'ed.
I don't think just leaking everything all the time is viable :-(. And even so this requires traversing the whole object reference graph on every communication operation, which defeats the purpose; the whole idea here was to find something that doesn't have to walk the object graph, because that's what makes pickle slow.
Another is to allow objects to exist only in one interpreter at a time.
Yeah, like rust -- very neat if we can do it! If I want you to give you this object, I can't have it anymore myself. But... I haven't been able to think of any way we could actually enforce this efficiently. When passing an object between subinterpreters you could require that the root object has refcount 1, and then do like a little mini-mark-and-sweep to make sure all the objects reachable from it only have references that are within the local object graph. But then we're traversing the object graph again. This would also require new syntax, because you need something like a simultaneous send-and-del, and you can't fake a del with a function call. (It also adds another extension module incompatibility, because it would require every extension type to implement tp_traverse -- previously this was only mandatory for objects that could indirectly reference themselves. But maybe the extension type thing doesn't matter because this would have to be restricted to builtin immutable types anyway, as per above.)
Similarly, object ownership (per interpreter) could help.
Asynchronous refcounting could be an option.
Right, like in the GILectomy branch. This is the only potentially viable solution I can think of (short of dropping refcounting altogether, which I think is how most multi-core languages solve this, and why PyPy has a head start on GIL removal). Do we know yet how much single-thread overhead this adds? It makes me nervous too -- a lot of the attraction of subinterpreters is that the minimal shared state is supposed to make GIL removal easier and less risky than if we were attempting a full GILectomy. But in this case I don't see how to exploit their isolation at all.
That's only some of the possible approaches. I expect that at least one of them will be suitable.
The reason I'm pushing on this is exactly because I don't expect that; I think it's very likely that you'll spend a bunch of time on the fun easier parts, and then discover that actually the hard parts are impossible. If you want me to shut up and leave you to it, say the word :-).
However, the first step is to get the multi-interpreter support out there. Then we can tackle the problem of optimization and multi-core utilization.
FWIW, the biggest complexity is actually in synchronizing the sharing strategy across the inter-interpreter boundary (e.g. FIFO). We should expect the relative time spent passing objects between interpreters to be very small. So not only does that provide us will a good target for our refcount resolving strategy, we can afford some performance wiggle room in that solution. (again, we're looking way ahead here)
But if the only advantage of subinterpreters over subprocesses is that the communication costs are lower, then surely you should be targeting cases where the communication costs are high? -n -- Nathaniel J. Smith -- https://vorpus.org