Re: [Python-ideas] async objects

6 Oct 2016

      On Wed, Oct 5, 2016 at 1:28 PM, Rene Nejsum  wrote:
...
When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to  a 4-core ditto.
I think there's two fundamentally different layers getting conflated
here, which is really confusing the issue.

Layer 1 is the user API for concurrency. At this layer, there are two
major options in current Python.

The first option is the "implicit interleaving" model provided by
classic threads, stackless, gevent, goroutines, etc., where as a user
you write regular "serial" code + some calls to thread spawning
primitives, and then the runtime magically arranges for multiple
pieces of "serial" code to run in some kind of concurrent/parallel
fashion.

One downside of this approach is that because the runtime gets to
arbitrarily decide how to interleave the execution of these different
pieces of code, it can be difficult for the user to reason about
interactions between them. So this motivated the second option for
user APIs: the "explicit interleaving" model where as a user you
annotate your code with some sort of marker saying where it's willing
to be suspended (Python uses the "await" keyword), and then the
runtime is restricted to only running one piece of code at a time, and
only switching between them at these explicitly marked points. (The
canonical reference on this is
https://glyph.twistedmatrix.com/2014/02/unyielding.html)

(I like to think about this as opt-out concurrency vs opt-in
concurrency: the first model is concurrent by default except where you
explicitly use a mutex; the second is serial by default except where
you explicitly use "await".)

So that's the user API level. Then there's Layer 2, the strategies
that the runtime underneath uses to implement whichever semantics are
in play. There are a lot of options here -- in particular, within the
"implicit interleaving" model Python has existing production-ready
implementations using OS level threads with a GIL (CPython's threading
module), clever C stack manipulation tricks on a single OS level
thread (gevent), OS level threads without a GIL (Jython's threading
module), etc., etc. Picking between these is an implementation
trade-off, not a language-level semantics trade-off -- from the point
of view of the user API, they're pretty much interchangeable.

...And in principle you could also use any of these options to
implement the "explicit interleaving" approach. For example, each
coroutine could get assigned its own OS level thread, and then to get
the 'await' semantics you could have a shared global lock that gets
dropped when entering an 'await' and then re-acquired afterwards. This
would be silly and inefficient compared to what asyncio actually does
(it uses a single thread, like gevent), so no-one would do this. But
my point is that at the user API level, again, these are just
implementation details -- this would be a valid way to implement the
async/await semantics.

So what can we conclude from all this?

First, if your goal is to write code that gets faster when you add
more CPU cores, then that means you're looking for a particular
implementation strategy: you want OS level threads, and no GIL.

One way to do this would be to keep the Python language semantics the
same, while modifying CPython's implementation to remove the GIL. This
turns out to be really hard :-). But Jython demonstrates that the
existing APIs are sufficient to make it possible -- the difficulties
are in the CPython implementation, not in the language, so that's
where it would need to be fixed. If someone wants to push this forward
probably the thing to do is to see how Larry's "gilectomy" project is
doing and help it along.

Another strategy would be to come up with some new user API that can
be added to the language, and whose semantics are more amenable to
no-GIL-multithreading. There are lots of somewhat nascent ideas out
there -- IIRC Eric's been thinking about using subinterpreters to add
shared-nothing threads (versus the shared-everything threads which
Python currently supports -- shared nothing is what Erlang does),
there's Armin's experiments with STM in PyPy, there's PyParallel, etc.
Nick has a good summary:
http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python...

But -- and this is the main point I've been leading up to --
async/await is *not* the new user-level API that you're looking for.
Async/await were created to enable the "explicitly interleaved" style
of programming, which as we saw above effectively takes the GIL and
promotes it to becoming an explicit part of the user API, instead of
an implementation detail of the runtime. This is the one and only
reason async/await exist -- if you don't want to explicitly control
where your code can switch "threads" and be guaranteed that no other
code is running at the same time, then there is no reason to use
async/await.

So I think the objection to async/await on the grounds that they
clutter up the code is based on a misunderstanding of what they're
for. It wasn't that we created these keywords to solve some
implementation problem and then inflicted them on users. It's exactly
the other way around. *If* you as a user want to add some explicit
annotations to your code to control how parallel execution can be
interleaved, *then* there has to be some keywords to write those
annotations, and that's what async/await are. And OTOH if you *don't*
want to have markers in your code to explicitly control interleaving
-- if you prefer the "implicit interleaving" style -- then async/await
are irrelevant and you shouldn't use them, you should use
threading/gevent/whatever.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

Re: [Python-ideas] async objects

Nathaniel Smith