[Python-ideas] solving multi-core Python

Mon Jun 22 19:37:01 CEST 2015

On Sun, Jun 21, 2015 at 4:56 AM Devin Jeanpierre <jeanpierreda at gmail.com>
wrote:

> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at gmail.com>
> wrote:
> >
> > On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at gmail.com>
> wrote:
> >>
> >> It's worthwhile to consider fork as an alternative.  IMO we'd get a
> >> lot out of making forking safer, easier, and more efficient. (e.g.
> >> respectively: adding an atfork registration mechanism; separating out
> >> the bits of multiprocessing that use pickle from those that d, I still
> disagreeon't;
> >> moving the refcount to a separate page, or allowing it to be frozen
> >> prior to a fork.)
> >
> > So leverage a common base of code with the multiprocessing module?
>
> What is this question in response to? I don't understand.
>
> > I would expect subinterpreters to use less memory.  Furthermore creating
> > them would be significantly faster.  Passing objects between them would
> be
> > much more efficient.  And, yes, cross-platform.
>
> Maybe I don't understand how subinterpreters work. AIUI, the whole
> point of independent subinterpreters is that they share no state. So
> if I have a web server, each independent serving thread has to do all
> of the initialization (import HTTP libraries, etc.), right? Compare
> with forking, where the initialization is all done and then you fork,
> and you are immediately ready to serve, using the data structures
> shared with all the other workers, which is only copied when it is
> written to.
>

Unfortunately CPython subinterpreters do share some state, though it is not
visible to the running code in many cases.  Thus the other mentions of
"wouldn't it be nice if CPython didn't assume a single global state per
process" (100% agreed, but tangential to this discussion)...

https://docs.python.org/3/c-api/init.html#sub-interpreter-support

You are correct that some things that could make sense to share, such as
imported modules, would not be shared as they are in a forked environment.

This is an important oddity of subinterpreters: They have to re-import
everything other than extension modules. When you've got a big process with
a ton of modules (like, say, 100s of protocol buffers...), that's going to
be a non-starter (pun intended) for the use of threads+subinterpreters as a
fast form of concurrency if they need to import most of those from each
subinterpreter. startup latency and cpu usage += lots. (possibly uses more
memory as well but given our existing refcount implementation forcing
needless PyObject page writes during a read causing fork to
copy-on-write... impossible to guess)

What this means for subinterpreters in this case is not much different from
starting up multiple worker processes: You need to start them up and wait
for them to be ready to serve, then reuse them as long as feasible before
recycling them to start up a new one. The startup cost is high.

I'm not entirely sold on this overall proposal, but I think a result of it
*could* be to make our subinterpreter support better which would be a good
thing.

We have had to turn people away from subinterpreters in the past for use as
part of their multithreaded C++ server where they wanted to occasionally
run some Python code in embedded interpreters as part of serving some
requests. Doing that would suddenly single thread their application
(GIIIIIIL!) for all requests currently executing Python code despite
multiple subinterpreters. The general advice for that: Run multiple Python
processes and make RPCs to those from the C++ code. It allows for
parallelism and ultimately scales better, if ever needed, as it can be
easily spread across machines. Which one is more complex to maintain? Good
question.

-gps

>
> Re passing objects, see below.
>
> I do agree it's cross-platform, but right now that's the only thing I
> agree with.
>
> >> Note: I don't count the IPC cost of forking, because at least on
> >> linux, any way to efficiently share objects between independent
> >> interpreters in separate threads can also be ported to independent
> >> interpreters in forked subprocesses,
> >
> > How so?  Subinterpreters are in the same process.  For this proposal each
> > would be on its own thread.  Sharing objects between them through
> channels
> > would be more efficient than IPC.  Perhaps I've missed something?
>
> You might be missing that memory can be shared between processes, not
> just threads, but I don't know.
>
> The reason passing objects between processes is so slow is currently
> *nearly entirely* the cost of serialization. That is, it's the fact
> that you are passing an object to an entirely separate interpreter,
> and need to serialize the whole object graph and so on. If you can
> make that fast without serialization, for shared memory threads, then
> all the serialization becomes unnecessary, and you can either write to
> a pipe (fast, if it's a non-container), or used shared memory from the
> beginning (instantaneous). This is possible on any POSIX OS. Linux
> lets you go even further.
>
> -- Devin
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150622/edf3e11b/attachment-0001.html>