[Python-ideas] solving multi-core Python

Wed Jun 24 07:26:08 CEST 2015

On Sun, Jun 21, 2015 at 5:55 AM, Devin Jeanpierre
<jeanpierreda at gmail.com> wrote:
> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>
>> On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at gmail.com> wrote:
>>>
>>> It's worthwhile to consider fork as an alternative.  IMO we'd get a
>>> lot out of making forking safer, easier, and more efficient. (e.g.
>>> respectively: adding an atfork registration mechanism; separating out
>>> the bits of multiprocessing that use pickle from those that d, I still disagreeon't;
>>> moving the refcount to a separate page, or allowing it to be frozen
>>> prior to a fork.)
>>
>> So leverage a common base of code with the multiprocessing module?
>
> What is this question in response to? I don't understand.

It sounded like you were suggesting that we factor out a common code
base that could be used by multiprocessing and the other machinery and
that only multiprocessing would keep the pickle-related code.

>
>> I would expect subinterpreters to use less memory.  Furthermore creating
>> them would be significantly faster.  Passing objects between them would be
>> much more efficient.  And, yes, cross-platform.
>
> Maybe I don't understand how subinterpreters work. AIUI, the whole
> point of independent subinterpreters is that they share no state. So
> if I have a web server, each independent serving thread has to do all
> of the initialization (import HTTP libraries, etc.), right?

Yes.  However, I expect that we could mitigate that cost to some extent.

> Compare
> with forking, where the initialization is all done and then you fork,
> and you are immediately ready to serve, using the data structures
> shared with all the other workers, which is only copied when it is
> written to. So forking starts up faster and uses less memory (due to
> shared memory.)

But we are aiming for a share-nothing model with an efficient
object-passing mechanism.  Furthermore, subinterpreters do not have to
be single-use.  My proposal includes running tasks in an existing
subinterpreter (e.g. executor pool), so that start-up cost is
mitigated in cases where it matters.

Note that ultimately my goal is to make it obvious and undeniable that
Python (3.6+) has a good multi-core story.  In my proposal,
subinterpreters are a means to an end.  If there's a better solution
then great!  As long as the real goal is met I'll be satisfied. :)
For now I'm still confident that the subinterpreter approach is the
best option for meeting the goal.

>
> Re passing objects, see below.
>
> I do agree it's cross-platform, but right now that's the only thing I
> agree with.
>
>>> Note: I don't count the IPC cost of forking, because at least on
>>> linux, any way to efficiently share objects between independent
>>> interpreters in separate threads can also be ported to independent
>>> interpreters in forked subprocesses,
>>
>> How so?  Subinterpreters are in the same process.  For this proposal each
>> would be on its own thread.  Sharing objects between them through channels
>> would be more efficient than IPC.  Perhaps I've missed something?
>
> You might be missing that memory can be shared between processes, not
> just threads, but I don't know.
>
> The reason passing objects between processes is so slow is currently
> *nearly entirely* the cost of serialization. That is, it's the fact
> that you are passing an object to an entirely separate interpreter,
> and need to serialize the whole object graph and so on. If you can
> make that fast without serialization,

That is a worthy goal!

> for shared memory threads, then
> all the serialization becomes unnecessary, and you can either write to
> a pipe (fast, if it's a non-container), or used shared memory from the
> beginning (instantaneous). This is possible on any POSIX OS. Linux
> lets you go even further.

And this is faster than passing objects around within the same
process?  Does it play well with Python's memory model?

-eric