Mailman 3 PEP 554 comments - Python-Dev

April 18, 2020

      Hello,

First, I would like to say that I have no fondamental problem with this
PEP. While I agree with Nathaniel that the rationale given about the CSP
concurrency model seems a bit weak, the author is obviously expressing
his opinion there and I won't object to that.  However, I think the PEP
is desirable for other reasons.  Mostly, I hope that by making the
subinterpreters functionality available to pure Python programmers
(while it was formally an advanced and arcane part of the C API), we
will spur of bunch of interesting third-party experimentations,
including possibilities that we on python-dev have not thought about.

The appeal of the PEP for experimentations is multiple:
1) ability to concurrently run independent execution environments
   without spawning child processes (which on some platforms and in some
   situations may not be very desirable: for example on Windows where
   the cost of spawning is rather high; also, child processes may
   crash, and sometimes it is not easy for the parent to recover,
   especially if a synchronization primitive is left in an unexpected
   state)
2) the potential for parallelizing CPU-bound pure Python code
   in a single process, if a per-interpreter GIL is finally implemented
3) easier support for sharing large data between separate execution
   environments, without the hassle of setting up shared memory or the
   fragility of relying on fork() semantics

(and as I said, I hope people find other applications)

As for the argument that we already have asyncio and several other
packages, I actually think that combining these different concurrency
mechanisms would be interesting complex applications (such as
distributed systems).  For that, however, I think the PEP as currently
written is a bit lacking, see below.

Now for the detailed comments.

* I think the module should indeed be provisional.  Experimentation may
  discover warts that call for a change in the API or semantics.  Let's
  not prevent ourselves from fixing those issues.

* The "association" timing seems quirky and potentially annoying: an
  interpreter only becomes associated with a channel the first time it
  calls recv() or send().  How about, instead, associating an
  interpreter with a channel as soon as that channel is given to it
  through `Interpreter.run(..., channels=...)` (or received through
  `recv()`)? 

* How hard would it be, in the current implementation, to add buffering
  to channels?  It doesn't have to be infinite: you can choose a fixed
  buffer size (or make it configurable in the create() function, which
  allows passing 0 for unbuffered).  Like Nathaniel, I think unbuffered
  channels will quickly be annoying to work with (yes, you can create a
  helper thread... now you have one additional thread per channel,
  which isn't pretty -- especially with the GIL).

* In the same vein, I think channels should allow adding readiness
  callbacks (that are called whenever a channel becomes ready for
  sending or receiving, respectively).  This would make it easy to plug
  them into an event loop or other concurrency systems (such as
  Future-based concurrency).  Note that each interpreter "associated"
  with a channel should be able to set its own readiness callback: so
  one callback per Python object representing the channel, but
  potentially multiple callbacks for the underlying channel primitive.

  (how would the callback be scheduled for execution in the right
  interpreter? perhaps using `_PyEval_AddPendingCall()` or a similar
  mechanism?)

* I think either `interpreters.get_main()` or `interpreters.is_main()`
  is desirable.  Inevitable, the slight differences between main and
  non-main interpreters will surface in non-trivial applications
  (finalization issues in distributed systems can really be hairy).  It
  seems this should be mostly costless to provide, so let's do it.

* I do think a minimal synchronization primitive would be nice.
  Either a Lock (in the Python sense) or a Semaphore: both should be
  relatively easy to provide, by wrapping an OS-level synchronization
  primitive.  Then you can recreate all high-level synchronization
  primitives, like the threading and multiprocessing modules do (using
  a Lock or a Semaphore, respectively).

  (note you should be able to emulate a semaphore using blocking send()
  and recv() calls, but that's probably not very efficient, and
  efficiency is important)

Of course, I hope these are all actionable before beta1 :-)  If not,
here is my preferential priority list:

* High priority: fix association timing
* High priority: either buffering /or/ readiness callbacks
* Middle priority: get_main() /or/ is_main()
* Middle / low priority: a simple synchronization primitive

But I would stress the more of these we provide, the more we encourage
people to experiment without pulling too much of their hair.

(also, of course, I hope other people read the PEP and emit feedback)

Best regards

Antoine.

PEP 554 comments

Antoine Pitrou

Antoine Pitrou

Eric Snow

Greg Ewing

Eric Snow

Greg Ewing

Eric Snow

Greg Ewing

Eric Snow

Antoine Pitrou

Eric Snow

Eric Snow

Greg Ewing

Eric Snow

Greg Ewing

Eric Snow

Jim J. Jewett

Greg Ewing

Eric Snow

Greg Ewing

Antoine Pitrou

Eric Snow

Antoine Pitrou

Eric Snow

Victor Stinner

Edwin Zimmerman

Victor Stinner

Eric Snow

Kyle Stanley

Eric Snow

tags

participants (7)