[Python-Dev] PEP 554 v3 (new interpreters module)

Antoine Pitrou solipsis at pitrou.net
Sat Sep 23 05:45:45 EDT 2017


Hi Eric,

On Fri, 22 Sep 2017 19:09:01 -0600
Eric Snow <ericsnowcurrently at gmail.com> wrote:
> 
> Please elaborate.  I'm interested in understanding what you mean here.
> Do you have some subinterpreter-based concurrency improvements in
> mind?  What aspect of CSP is the PEP following too faithfully?

See below the discussion of blocking send()s :-)

> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
> what the benefit would be.  You can compose either list manually with
> a simple comprehension:
> 
>     [interp for interp in interpreters.list_all() if interp.is_running()]
>     [interp for interp in interpreters.list_all() if not interp.is_running()]

There is a inherit race condition in doing that, at least if
interpreters are running in multiple threads (which I assume is going
to be the overly dominant usage model).  That is why I'm proposing all
three variants.

> >  I don't think it's a
> > coincidence that the most varied kinds of I/O (from socket or file IO
> > to threading Queues to multiprocessing Pipes) have non-blocking send().  
> 
> Interestingly, you can set sockets to blocking mode, in which case
> send() will block until there is room in the kernel buffer.

Yes, but there *is* a kernel buffer. Which is the whole point of my
comment: most alike primitives have internal buffering to prevent the
user-facing send() API from blocking in the common case.

> Likewise,
> queue.Queue.send() supports blocking, in addition to providing a
> put_nowait() method.

queue.Queue.put() never blocks in the usual case (*), which is of an
unbounded queue.  Only bounded queues (created with an explicit
non-zero max_size parameter) can block in Queue.put().

(*) and therefore also never deadlocks :-)

> Note that the PEP provides "recv_nowait()" and "send_nowait()" (names
> inspired by queue.Queue), allowing for a non-blocking send.

True, but it's not the same thing at all.  In the objects I mentioned,
send() mostly doesn't block and doesn't fail either.  In your model,
send_nowait() will routinely fail with an error if a recipient isn't
immediately available to recv the data.

> > send() blocking until someone else calls recv() is not only bad for
> > performance,  
> 
> What is the performance problem?

Intuitively, there must be some kind of context switch (interpreter
switch?) at each send() call to let the other end receive the data,
since you don't have any internal buffering.

Also, suddenly an interpreter's ability to exploit CPU time is
dependent on another interpreter's ability to consume data in a timely
manner (what if the other interpreter is e.g. stuck on some disk I/O?).
IMHO it would be better not to have such coupling.

> > it also increases the likelihood of deadlocks.  
> 
> How much of a problem will deadlocks be in practice?

I expect more often than expected, in complex systems :-)  For example,
you could have a recv() loop that also from time to time send()s some
data on another queue, depending on what is received.  But if that
send()'s recipient also has the same structure (a recv() loop which
send()s from time to time), then it's easy to imagine to two getting in
a deadlock.

> (FWIW, CSP
> provides rigorous guarantees about deadlock detection (which Go
> leverages), though I'm not sure how much benefit that can offer such a
> dynamic language as Python.)

Hmm... deadlock detection is one thing, but when detected you must still
solve those deadlock issues, right?

> I'm not sure I understand your concern here.  Perhaps I used the word
> "sharing" too ambiguously?  By "sharing" I mean that the two actors
> have read access to something that at least one of them can modify.
> If they both only have read-only access then it's effectively the same
> as if they are not sharing.

Right.  What I mean is that you *can* share very simple "data" under
the form of synchronization primitives.  You may want to synchronize
your interpreters even they don't share user-visible memory areas.  The
point of synchronization is not only to avoid memory corruption but
also to regulate and orchestrate processing amongst multiple workers
(for example processes or interpreters).  For example, a semaphore is
an easy way to implement "I want no more than N workers to do this
thing at the same time" ("this thing" can be something such as disk
I/O).

Regards

Antoine.


More information about the Python-Dev mailing list