On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available.
Okay, but you're assuming that "more libraries work well with subinterpreters" is in fact an improvement. I'm asking you to convince me of that :-). Are there people saying "oh, if only subinterpreters had a Python API and less weird interactions with C extensions, I could do <something awesome>"? So far they haven't exactly taken the world by storm...
The problem is that most people don't know about the feature. And even if they do, using it requires writing a C-extension, which most people aren't comfortable doing.
Other than C globals, is there some other issue?
That's the main one I'm aware of, yeah, though I haven't looked into it closely.
Oh, good. I haven't missed something. :) Do you know how often subinterpreter support is a problem for users? I was under the impression from your earlier statements that this is a recurring issue but my understanding from mod_wsgi is that it isn't that common.
I'm fine with Nick's idea about making this a "provisional" module. Would that be enough to ease your concern here?
Potentially, yeah -- basically I'm fine with anything that doesn't end up looking like python-dev telling everyone "subinterpreters are the future! go forth and yell at any devs who don't support them!".
Great! I'm also looking at the possibility of adding a mechanism for extension modules to opt out of subinterpreter support (using PEP 489 ModuleDef slots). However, I'd rather wait on that if making the PEP provisional is sufficient.
What do you think the criteria for graduating to non-provisional status should be, in this case?
Consensus among the (Dutch?) core devs that subinterpreters are worth keeping in the stdlib and that we've smoothed out any rough parts in the module.
I guess I would be much more confident in the possibilities here if you could give:
- some hand-wavy sketch for how subinterpreter A could call a function that as originally defined in subinterpreter B without the GIL, which seems like a precondition for sharing user-defined classes
(Before I respond, note that this is way outside the scope of the PEP. The merit of subinterpreters extends beyond any benefits of running sans-GIL, though that is my main goal. I've been updating the PEP to (hopefully) better communicate the utility of subinterpreters.) Code objects are immutable so that part should be relatively straight-forward. There's the question of closures and default arguments that would have to be resolved. However, those are things that would need to be supported anyway in a world where we want to pass functions and user-defined types between interpreters. Doing so will be a gradual process of starting with immutable non-container builtin types and expanding out from there to other immutable types, including user-defined ones. Note that sharing mutable objects between interpreters would be a pretty advanced usage (i.e. opt-in shared state vs. threading's share-everything). If it proves desirable then we'd sort that out then. However, I don't see that as a more than an esoteric feature relative to subinterpreters. In my mind, the key advantage of being able to share more (immutable) objects, including user-defined types, between interpreters is in the optimization opportunities. It would allow us to avoid instantiating the same object in each interpreter. That said, the way I imagine it I wouldn't consider such an optimization to be very user-facing so it doesn't impact the PEP. The user-facing part would be the expanded set of immutable objects interpreters could pass back and forth, and expanding that set won't require any changes to the API in the PEP.
- some hand-wavy sketch for how refcounting will work for objects shared between multiple subinterpreters without the GIL, without majorly impacting single-thread performance (I actually forgot about this problem in my last email, because PyPy has already solved this part!)
(same caveat as above) There are a number of approaches that may work. One is to give each interpreter its own allocator and GC. Another is to mark shared objects such that they never get GC'ed. Another is to allow objects to exist only in one interpreter at a time. Similarly, object ownership (per interpreter) could help. Asynchronous refcounting could be an option. That's only some of the possible approaches. I expect that at least one of them will be suitable. However, the first step is to get the multi-interpreter support out there. Then we can tackle the problem of optimization and multi-core utilization. FWIW, the biggest complexity is actually in synchronizing the sharing strategy across the inter-interpreter boundary (e.g. FIFO). We should expect the relative time spent passing objects between interpreters to be very small. So not only does that provide us will a good target for our refcount resolving strategy, we can afford some performance wiggle room in that solution. (again, we're looking way ahead here)
Thanks for attempting such an ambitious project :-).
Hey, I'm learning a lot and feel like every step along the way is making Python better in some stand-alone way. :) -eric