Re: [Python-ideas] solving multi-core Python

June 24, 2015

      On Sun, Jun 21, 2015 at 5:55 AM, Devin Jeanpierre
<jeanpierreda@gmail.com> wrote:
...
On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
...
On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda@gmail.com> wrote:
...
It's worthwhile to consider fork as an alternative.  IMO we'd get a
lot out of making forking safer, easier, and more efficient. (e.g.
respectively: adding an atfork registration mechanism; separating out
the bits of multiprocessing that use pickle from those that d, I still disagreeon't;
moving the refcount to a separate page, or allowing it to be frozen
prior to a fork.)
So leverage a common base of code with the multiprocessing module?
What is this question in response to? I don't understand.
It sounded like you were suggesting that we factor out a common code
base that could be used by multiprocessing and the other machinery and
that only multiprocessing would keep the pickle-related code.
...
...
I would expect subinterpreters to use less memory.  Furthermore creating
them would be significantly faster.  Passing objects between them would be
much more efficient.  And, yes, cross-platform.
Maybe I don't understand how subinterpreters work. AIUI, the whole
point of independent subinterpreters is that they share no state. So
if I have a web server, each independent serving thread has to do all
of the initialization (import HTTP libraries, etc.), right?
Yes.  However, I expect that we could mitigate that cost to some extent.
...
Compare
with forking, where the initialization is all done and then you fork,
and you are immediately ready to serve, using the data structures
shared with all the other workers, which is only copied when it is
written to. So forking starts up faster and uses less memory (due to
shared memory.)
But we are aiming for a share-nothing model with an efficient
object-passing mechanism.  Furthermore, subinterpreters do not have to
be single-use.  My proposal includes running tasks in an existing
subinterpreter (e.g. executor pool), so that start-up cost is
mitigated in cases where it matters.

Note that ultimately my goal is to make it obvious and undeniable that
Python (3.6+) has a good multi-core story.  In my proposal,
subinterpreters are a means to an end.  If there's a better solution
then great!  As long as the real goal is met I'll be satisfied. :)
For now I'm still confident that the subinterpreter approach is the
best option for meeting the goal.
...
Re passing objects, see below.
I do agree it's cross-platform, but right now that's the only thing I
agree with.
...
...
Note: I don't count the IPC cost of forking, because at least on
linux, any way to efficiently share objects between independent
interpreters in separate threads can also be ported to independent
interpreters in forked subprocesses,
How so?  Subinterpreters are in the same process.  For this proposal each
would be on its own thread.  Sharing objects between them through channels
would be more efficient than IPC.  Perhaps I've missed something?
You might be missing that memory can be shared between processes, not
just threads, but I don't know.
The reason passing objects between processes is so slow is currently
*nearly entirely* the cost of serialization. That is, it's the fact
that you are passing an object to an entirely separate interpreter,
and need to serialize the whole object graph and so on. If you can
make that fast without serialization,
That is a worthy goal!
...
for shared memory threads, then
all the serialization becomes unnecessary, and you can either write to
a pipe (fast, if it's a non-container), or used shared memory from the
beginning (instantaneous). This is possible on any POSIX OS. Linux
lets you go even further.
And this is faster than passing objects around within the same
process?  Does it play well with Python's memory model?

-eric