
Hi all, Personally I feel that the current subinterpreter support falls short in the sense that it still requires a single GIL across interpreters. If interpreters would have their own individual GIL, we could have true shared-nothing multi-threaded support similar to Javascript's "Web Workers". Here is a point-wise overview of what I am imagining. I realize the following is very ambitious, but I would like to bring it to your consideration. 1. Multiple interpreters can be instantiated, each of which is completely independent. To this end, all global interpreter state needs to go into an interpreter strucutre, including the GIL (which becomes per-interpreter) Interpreters share no state whatsoever. 2. PyObject's are tied to a particular interpreter and cannot be shared between interpreters. (This is because each interpreter now has its own GIL.) I imagine a special debug build would actually store the interpreter pointer in the PyObject and would assert everywhere that the PyObject is only manipulated by its owning interpreter. 3. Practically all existing APIs, including Py_INCREF and Py_DECREF, need to get an additional explicit interpreter argument. I imagine that we would have a new prefix, say MPy_, because the existing APIs must be left for backward compatibility. 4. At most one interpreter can be designated the "main" interpreter. This is for backward compatibility of existing extension modules ONLY. All the existing Py_* APIs operate implicitly on this main interpreter. 5. Extension modules need to explicitly advertise multiple interpreter support. If they don't, they can only be imported in the main interpreter. However, in that case they can safely use the existing Py_ APIs. 6. Since PyObject's cannot be shared across interpreters, there needs to be an explicit function which takes a PyObject in interpreter A and constructs a similar object in interpreter B. Conceptually this would be equivalent to pickling in A and unpickling in B, but presumably more efficient. It would use the copyreg registry in a similar way to pickle. 7. Extension modules would also be able to register their function for copying custom types across interpreters . That would allow extension modules to provide custom types where the underlying C object is in fact not copied but shared between interpreters. I would imagine we would have a"shared memory" memoryview object and also Mutex and other locking constructs which would work across interpreters. 8. Finally, the main application: functionality similar to the current `multiprocessing' module, but with multiple interpreters on multiple threads in a single process. This would presumably be more efficient than `multiprocessing' and also allow extra functionality, since the underlying C objects can in fact be shared. (Imagine two interpreters operating in parallel on a single OpenCL context.) Stephan Op 26 mei 2017 10:41 a.m. schreef "Petr Viktorin" <encukou@gmail.com>:
On 05/25/2017 09:01 PM, Eric Snow wrote:
On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith <njs@pobox.com> wrote:
My impression is that the code to support them inside CPython is fine, but they're broken and not very useful in the sense that lots of C extensions don't really support them, so in practice you can't reliably use them to run arbitrary code. Numpy for example definitely has lots of subinterpreter-related bugs, and when they get reported we close them as WONTFIX.
Based on conversations at last year's pycon, my impression is that numpy probably *could* support subinterpreters (i.e. the required apis exist), but none of us really understand the details, it's the kind of problem that requires a careful whole-codebase audit, and a naive approach might make numpy's code slower and more complicated for everyone. (For example, there are lots of places where numpy keeps a little global cache that I guess should instead be per-subinterpreter caches, which would mean adding an extra lookup operation to fast paths.)
Thanks for pointing this out. You've clearly described probably the biggest challenge for folks that try to use subinterpreters. PEP 384 was meant to help with this, but seems to have fallen short. PEP 489 can help identify modules that profess subinterpreter support, as well as facilitating future extension module helpers to deal with global state. However, I agree that *right now* getting extension modules to reliably work with subinterpreters is not easy enough. Furthermore, that won't change unless there is sufficient benefit tied to subinterpreters, as you point out below.
PEP 489 was a first step; the work is not finished. The next step is solving a major reason people are using global state in extension modules: per-module state isn't accessible from all the places it should be, namely in methods of classes. In other words, I don't think Python is ready for big projects like Numpy to start properly supporting subinterpreters.
The work on fixing this has stalled, but it looks like I'll be getting back on track. Discussions about this are on the import-sig list, reach out there if you'd like to help. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/