
On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith <greg@krypto.org> wrote:
This is an important oddity of subinterpreters: They have to re-import everything other than extension modules. When you've got a big process with a ton of modules (like, say, 100s of protocol buffers...), that's going to be a non-starter (pun intended) for the use of threads+subinterpreters as a fast form of concurrency if they need to import most of those from each subinterpreter. startup latency and cpu usage += lots. (possibly uses more memory as well but given our existing refcount implementation forcing needless PyObject page writes during a read causing fork to copy-on-write... impossible to guess)
What this means for subinterpreters in this case is not much different from starting up multiple worker processes: You need to start them up and wait for them to be ready to serve, then reuse them as long as feasible before recycling them to start up a new one. The startup cost is high.
One possibility would be for subinterpreters to copy modules from the main interpreter -- I guess your average module is mostly dicts, strings, type objects, and functions; strings and functions are already immutable and could be shared without copying, and I guess copying the dicts and type objects into the subinterpreter is much cheaper than hitting the disk etc. to do a real import. (Though certainly not free.) This would have interesting semantic implications -- it would give similar effects to fork(), with subinterpreters starting from a snapshot of the main interpreter's global state.
I'm not entirely sold on this overall proposal, but I think a result of it could be to make our subinterpreter support better which would be a good thing.
We have had to turn people away from subinterpreters in the past for use as part of their multithreaded C++ server where they wanted to occasionally run some Python code in embedded interpreters as part of serving some requests. Doing that would suddenly single thread their application (GIIIIIIL!) for all requests currently executing Python code despite multiple subinterpreters.
I've also talked to HPC users who discovered this problem the hard way (e.g. http://www-atlas.lbl.gov/, folks working on the Large Hadron Collider) -- they've been using Python as an extension language in some large physics codes but are now porting those bits to C++ because of the GIL issues. (In this context startup overhead should be easily amortized, but switching to an RPC model is not going to happen.) -n -- Nathaniel J. Smith -- http://vorpus.org