[Python-Dev] Sub-interpreters: importing numpy causes hang

Wed Jan 23 14:41:24 EST 2019

You all do make me feel very welcome in this community! Thank you very much! :-)

And thank you for all the thought and time you put into your message,
Eric. I do appreciate in particular all the alternatives you
presented; you provide a good picture of my options.
Not ruling out any of them, I'll stick with (single process + multiple
subinterpreters + plugins can't keep state in Python + all my Python
calls are performed on the main thread) for the time being. That's
quite a limited environment, which I hope I can make work in the long
run. And I think the concept of subinterpreters is nice and I'd like
to spend some time on the challenge of improving the situation.

So, I updated my changes and have the following on top of 3.6.1 at the moment:
https://github.com/stephanreiter/cpython/commit/c1afa0c8cdfab862f409f1c7ff02b189f5191cbe

I did what Henry suggested and ran the Python test suite. On Windows,
with my changes I get as output:

357 tests OK.

2 tests failed:
    test_re test_subprocess

46 tests skipped:
    test_bz2 test_crypt test_curses test_dbm_gnu test_dbm_ndbm
    test_devpoll test_epoll test_fcntl test_fork1 test_gdb test_grp
    test_idle test_ioctl test_kqueue test_lzma test_nis test_openpty
    test_ossaudiodev test_pipes test_poll test_posix test_pty test_pwd
    test_readline test_resource test_smtpnet test_socketserver
    test_spwd test_sqlite test_ssl test_syslog test_tcl
    test_threadsignals test_timeout test_tix test_tk test_ttk_guionly
    test_ttk_textonly test_turtle test_urllib2net test_urllibnet
    test_wait3 test_wait4 test_winsound test_xmlrpc_net test_zipfile64

Total duration: 6 min 20 sec
Tests result: FAILURE

I dropped my changes and ran the test suite again using vanilla Python
and got the same result.
So, it seems that the change doesn't break anything that is tested,
but that probably doesn't mean a lot.

Tomorrow, I'll investigate the following situation if I find time:

If we create a fresh OS thread and make it call PyGILState_Ensure, it
won't have a PyThreadState saved under autoTLSkey. That means it will
create one using the main interpreter. I, as the developer embedding
Python into my application and using multiple interpreters, have no
control here. Maybe I know that under current conditions a certain
other interpreter should be used.

I'll try to provoke this situation and then introduce a callback from
Python into my application that will allow me to specify which
interpreter should be used, e.g. code as follows:

PyInterpreter *pickAnInterpreter() {
  return activePlugin ? activePlugin->interpreter : nullptr; //
nullptr maps to main interpreter
}

PyGILState_SetNewThreadInterpreterSelectionCallback(&pickAnInterpreter);

Maybe rubbish. But I think a valuable experiment that will give me a
better understanding.

Stephan

Am Mi., 23. Jan. 2019 um 18:11 Uhr schrieb Eric Snow
<ericsnowcurrently at gmail.com>:
>
> Hi Stephan,
>
> On Tue, Jan 22, 2019 at 9:25 AM Stephan Reiter <stephan.reiter at gmail.com> wrote:
> > I am new to the list and arriving with a concrete problem that I'd
> > like to fix myself.
>
> That is great!  Statements like that are a good way to get folks
> interested in your success. :)
>
> > I am embedding Python (3.6) into my C++ application and I would like
> > to run Python scripts isolated from each other using sub-interpreters.
> > I am not using threads; everything is supposed to run in the
> > application's main thread.
>
> FYI, running multiple interpreters in the same (e.g. main) thread
> isn't as well thought out as running them in separate threads.  There
> may be assumptions in the runtime that would cause crashes or
> inconsistency in the runtime, so be vigilant.  Is there a reason not
> to run the subinterpreters in separate threads?
>
> Regarding isolation, keep in mind that there are some limitations.  At
> an intrinsic level subinterpreters are never truly isolated since they
> run in the same process.  This matters if you have concerns about
> security (which you should always consider) and stability (if a
> subinterpreter crashes then your whole process crashes).  You can find
> that complete isolation via subprocess & multiprocessing.
>
> On top of intrinsic isolation, currently subinterpreters have gaps in
> isolation that need fixing.  For instance, they share a lot of
> module-global state, as well as builtin types and singletons.  So data
> can leak between subinterpreters unexpectedly.
>
> Finally, at the Python level subinterpreters don't have a good way to
> pass data around.  (I'm working on that. [1])  Naturally at the C
> level you can keep pointers to objects and share data that way.  Just
> keep in mind that doing so relies on the GIL (in an
> interpreter-per-thread scenario, which you're avoiding).  In a world
> where subinterpreters don't share the GIL [2] (and you're running one
> interpreter per thread) you'll end up with refcounting races, leading
> to crashes.  Just keep that mind if you decide to switch to
> one-subinterpreter-per-thread.
>
> On Tue, Jan 22, 2019 at 8:09 PM Stephan Reiter <stephan.reiter at gmail.com> wrote:
> > Nathaniel, I'd like to allow Python plugins in my application. A
> > plugin should be allowed to bring its own modules along (i.e.
> > plugin-specific subdir is in sys.path when the plugin is active) and
> > hence some isolation of them will be needed, so that they can use
> > different versions of a given module. That's my main motivation for
> > using subinterpreters.
>
> That's an interesting approach.  Using subinterpreters would indeed
> give you isolation between the sets of imported modules.
>
> As you noticed, you'll run into some problems when extension modules
> are involved.  There aren't any great workarounds yet .
> Subinterpreters are tied pretty tightly to the core runtime so it's
> hard to attack the problem from the outside.  Furthermore,
> subinterpreters aren't widely used yet so folks haven't been very
> motivated to fix the runtime.  (FWIW, that is changing.)
>
> > I thought about running plugins out-of-processes - a separate process
> > for every plugin - and allow them to communicate with my application
> > via RPC. But that makes it more complex to implement the API my
> > application will offer and will slow down things due to the need to
> > copy data.
>
> Yep.  It might be worth it though.  Note that running
> plugins/extensions in separate processes is a fairly common approach
> for a variety of solid technical reasons (e.g. security, stability).
> FWIW, there are some tools available (or soon to be) for sharing data
> more efficiently (e.g. shared memory in multiprocessing, PEP 574)
>
> > Maybe you have another idea for me? :)
>
> * single proc -- keep using subinterpreters
>   + dlmopen or the Windows equivalent (I hesitate to suggest this
> hack, but it might help somewhat with extension modules)
>   + help fix the problems with subinterpreters :)
> * single proc -- no subinterpreters
>   + import hook to put plugins in their own namespace (tricky with
> extension modules)
>   + extend importlib to do the same
>   + swap sys.modules in and out around plugin use
> * multi-proc -- one process per plugin
>   + subprocess
>   + multiprocessing
>
> On Wed, Jan 23, 2019 at 8:48 AM Stephan Reiter <stephan.reiter at gmail.com> wrote:
> > Well, the plugins would be created by third-parties and I'd like them
> > to enable bunding of modules with their plugins.
> > I am afraid of modules with the same name, but being different, or
> > different versions of modules being used by different plugins. If
> > plugins share an interpreter, the module with a given name that is
> > imported first sticks around forever and for all plugins.
> >
> > I am thinking about this design:
> > - Plugins don't maintain state in their Python world. They expose
> > functions, my application calls them.
> > - Everytime I call into them, they are presented with a clean global
> > namespace. After the call, the namespace (dict) is thrown away. That
> > releases any objects the plugin code has created.
> > - So, then I could also actively unload modules they loaded. But I do
> > know that this is problematic in particular for modules that use
> > native code.
> >
> > I am interested in both a short-term and a long-term solution.
> > Actually, making subinterpreters work better is pretty sexy ...
> > because it's hard. :-)
>
> Petr noted that a number of people are working on getting
> subinterpreters to a good place.  That includes me. [1][2] :)  We'd
> welcome any help!
>
> -eric
>
>
> [1] https://www.python.org/dev/peps/pep-0554/
> [2] https://github.com/ericsnowcurrently/multi-core-python