[Python-Dev] PEP 554 v3 (new interpreters module)

Mon Sep 25 20:42:02 EDT 2017

On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
>> what the benefit would be.  You can compose either list manually with
>> a simple comprehension:
>>
>>     [interp for interp in interpreters.list_all() if interp.is_running()]
>>     [interp for interp in interpreters.list_all() if not interp.is_running()]
>
> There is a inherit race condition in doing that, at least if
> interpreters are running in multiple threads (which I assume is going
> to be the overly dominant usage model).  That is why I'm proposing all
> three variants.

There's a race condition no matter what the API looks like -- having a
dedicated running_interpreters() lets you guarantee that the returned
list describes the set of interpreters that were running at some
moment in time, but you don't know when that moment was and by the
time you get the list, it's already out-of-date. So this doesn't seem
very useful. OTOH if we think that invariants like this are useful, we
might also want to guarantee that calling running_interpreters() and
idle_interpreters() gives two lists such that each interpreter appears
in exactly one of them, but that's impossible with this API; it'd
require a single function that returns both lists.

What problem are you trying to solve?

>> Likewise,
>> queue.Queue.send() supports blocking, in addition to providing a
>> put_nowait() method.
>
> queue.Queue.put() never blocks in the usual case (*), which is of an
> unbounded queue.  Only bounded queues (created with an explicit
> non-zero max_size parameter) can block in Queue.put().
>
> (*) and therefore also never deadlocks :-)

Unbounded queues also introduce unbounded latency and memory usage in
realistic situations. (E.g. a producer/consumer setup where the
producer runs faster than the consumer.) There's a reason why sockets
always have bounded buffers -- it's sometimes painful, but the pain is
intrinsic to building distributed systems, and unbounded buffers just
paper over it.

>> > send() blocking until someone else calls recv() is not only bad for
>> > performance,
>>
>> What is the performance problem?
>
> Intuitively, there must be some kind of context switch (interpreter
> switch?) at each send() call to let the other end receive the data,
> since you don't have any internal buffering.

Technically you just need the other end to wake up at some time in
between any two calls to send(), and if there's no GIL then this
doesn't necessarily require a context switch.

> Also, suddenly an interpreter's ability to exploit CPU time is
> dependent on another interpreter's ability to consume data in a timely
> manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> IMHO it would be better not to have such coupling.

A small buffer probably is useful in some cases, yeah -- basically
enough to smooth out scheduler jitter.

>> > it also increases the likelihood of deadlocks.
>>
>> How much of a problem will deadlocks be in practice?
>
> I expect more often than expected, in complex systems :-)  For example,
> you could have a recv() loop that also from time to time send()s some
> data on another queue, depending on what is received.  But if that
> send()'s recipient also has the same structure (a recv() loop which
> send()s from time to time), then it's easy to imagine to two getting in
> a deadlock.

You kind of want to be able to create deadlocks, since the alternative
is processes that can't coordinate and end up stuck in livelocks or
with unbounded memory use etc.

>> I'm not sure I understand your concern here.  Perhaps I used the word
>> "sharing" too ambiguously?  By "sharing" I mean that the two actors
>> have read access to something that at least one of them can modify.
>> If they both only have read-only access then it's effectively the same
>> as if they are not sharing.
>
> Right.  What I mean is that you *can* share very simple "data" under
> the form of synchronization primitives.  You may want to synchronize
> your interpreters even they don't share user-visible memory areas.  The
> point of synchronization is not only to avoid memory corruption but
> also to regulate and orchestrate processing amongst multiple workers
> (for example processes or interpreters).  For example, a semaphore is
> an easy way to implement "I want no more than N workers to do this
> thing at the same time" ("this thing" can be something such as disk
> I/O).

It's fairly reasonable to implement a mutex using a CSP-style
unbuffered channel (send = acquire, receive = release). And the same
trick turns a channel with a fixed-size buffer into a bounded
semaphore. It won't be as efficient as a modern specialized mutex
implementation, of course, but it's workable.

Unfortunately while technically you can construct a buffered channel
out of an unbuffered channel, the construction's pretty unreasonable
(it needs two dedicated threads per channel).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org