Re: [Python-ideas] solving multi-core Python

June 22, 2015

      On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith <greg@krypto.org> wrote:
...
This is an important oddity of subinterpreters: They have to re-import
everything other than extension modules. When you've got a big process with
a ton of modules (like, say, 100s of protocol buffers...), that's going to
be a non-starter (pun intended) for the use of threads+subinterpreters as a
fast form of concurrency if they need to import most of those from each
subinterpreter. startup latency and cpu usage += lots. (possibly uses more
memory as well but given our existing refcount implementation forcing
needless PyObject page writes during a read causing fork to copy-on-write...
impossible to guess)
What this means for subinterpreters in this case is not much different from
starting up multiple worker processes: You need to start them up and wait
for them to be ready to serve, then reuse them as long as feasible before
recycling them to start up a new one. The startup cost is high.
One possibility would be for subinterpreters to copy modules from the
main interpreter -- I guess your average module is mostly dicts,
strings, type objects, and functions; strings and functions are
already immutable and could be shared without copying, and I guess
copying the dicts and type objects into the subinterpreter is much
cheaper than hitting the disk etc. to do a real import. (Though
certainly not free.)

This would have interesting semantic implications -- it would give
similar effects to fork(), with subinterpreters starting from a
snapshot of the main interpreter's global state.
...
I'm not entirely sold on this overall proposal, but I think a result of it
could be to make our subinterpreter support better which would be a good
thing.
We have had to turn people away from subinterpreters in the past for use as
part of their multithreaded C++ server where they wanted to occasionally run
some Python code in embedded interpreters as part of serving some requests.
Doing that would suddenly single thread their application (GIIIIIIL!) for
all requests currently executing Python code despite multiple
subinterpreters.
I've also talked to HPC users who discovered this problem the hard way
(e.g. http://www-atlas.lbl.gov/, folks working on the Large Hadron
Collider) -- they've been using Python as an extension language in
some large physics codes but are now porting those bits to C++ because
of the GIL issues. (In this context startup overhead should be easily
amortized, but switching to an RPC model is not going to happen.)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org

Re: [Python-ideas] solving multi-core Python

Nathaniel Smith