[Python-ideas] solving multi-core Python
ericsnowcurrently at gmail.com
Wed Jun 24 07:26:08 CEST 2015
On Sun, Jun 21, 2015 at 5:55 AM, Devin Jeanpierre
<jeanpierreda at gmail.com> wrote:
> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>> On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at gmail.com> wrote:
>>> It's worthwhile to consider fork as an alternative. IMO we'd get a
>>> lot out of making forking safer, easier, and more efficient. (e.g.
>>> respectively: adding an atfork registration mechanism; separating out
>>> the bits of multiprocessing that use pickle from those that d, I still disagreeon't;
>>> moving the refcount to a separate page, or allowing it to be frozen
>>> prior to a fork.)
>> So leverage a common base of code with the multiprocessing module?
> What is this question in response to? I don't understand.
It sounded like you were suggesting that we factor out a common code
base that could be used by multiprocessing and the other machinery and
that only multiprocessing would keep the pickle-related code.
>> I would expect subinterpreters to use less memory. Furthermore creating
>> them would be significantly faster. Passing objects between them would be
>> much more efficient. And, yes, cross-platform.
> Maybe I don't understand how subinterpreters work. AIUI, the whole
> point of independent subinterpreters is that they share no state. So
> if I have a web server, each independent serving thread has to do all
> of the initialization (import HTTP libraries, etc.), right?
Yes. However, I expect that we could mitigate that cost to some extent.
> with forking, where the initialization is all done and then you fork,
> and you are immediately ready to serve, using the data structures
> shared with all the other workers, which is only copied when it is
> written to. So forking starts up faster and uses less memory (due to
> shared memory.)
But we are aiming for a share-nothing model with an efficient
object-passing mechanism. Furthermore, subinterpreters do not have to
be single-use. My proposal includes running tasks in an existing
subinterpreter (e.g. executor pool), so that start-up cost is
mitigated in cases where it matters.
Note that ultimately my goal is to make it obvious and undeniable that
Python (3.6+) has a good multi-core story. In my proposal,
subinterpreters are a means to an end. If there's a better solution
then great! As long as the real goal is met I'll be satisfied. :)
For now I'm still confident that the subinterpreter approach is the
best option for meeting the goal.
> Re passing objects, see below.
> I do agree it's cross-platform, but right now that's the only thing I
> agree with.
>>> Note: I don't count the IPC cost of forking, because at least on
>>> linux, any way to efficiently share objects between independent
>>> interpreters in separate threads can also be ported to independent
>>> interpreters in forked subprocesses,
>> How so? Subinterpreters are in the same process. For this proposal each
>> would be on its own thread. Sharing objects between them through channels
>> would be more efficient than IPC. Perhaps I've missed something?
> You might be missing that memory can be shared between processes, not
> just threads, but I don't know.
> The reason passing objects between processes is so slow is currently
> *nearly entirely* the cost of serialization. That is, it's the fact
> that you are passing an object to an entirely separate interpreter,
> and need to serialize the whole object graph and so on. If you can
> make that fast without serialization,
That is a worthy goal!
> for shared memory threads, then
> all the serialization becomes unnecessary, and you can either write to
> a pipe (fast, if it's a non-container), or used shared memory from the
> beginning (instantaneous). This is possible on any POSIX OS. Linux
> lets you go even further.
And this is faster than passing objects around within the same
process? Does it play well with Python's memory model?
More information about the Python-ideas