On Wed, Oct 5, 2016 at 1:28 PM, Rene Nejsum
When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto.
I think there's two fundamentally different layers getting conflated here, which is really confusing the issue. Layer 1 is the user API for concurrency. At this layer, there are two major options in current Python. The first option is the "implicit interleaving" model provided by classic threads, stackless, gevent, goroutines, etc., where as a user you write regular "serial" code + some calls to thread spawning primitives, and then the runtime magically arranges for multiple pieces of "serial" code to run in some kind of concurrent/parallel fashion. One downside of this approach is that because the runtime gets to arbitrarily decide how to interleave the execution of these different pieces of code, it can be difficult for the user to reason about interactions between them. So this motivated the second option for user APIs: the "explicit interleaving" model where as a user you annotate your code with some sort of marker saying where it's willing to be suspended (Python uses the "await" keyword), and then the runtime is restricted to only running one piece of code at a time, and only switching between them at these explicitly marked points. (The canonical reference on this is https://glyph.twistedmatrix.com/2014/02/unyielding.html) (I like to think about this as opt-out concurrency vs opt-in concurrency: the first model is concurrent by default except where you explicitly use a mutex; the second is serial by default except where you explicitly use "await".) So that's the user API level. Then there's Layer 2, the strategies that the runtime underneath uses to implement whichever semantics are in play. There are a lot of options here -- in particular, within the "implicit interleaving" model Python has existing production-ready implementations using OS level threads with a GIL (CPython's threading module), clever C stack manipulation tricks on a single OS level thread (gevent), OS level threads without a GIL (Jython's threading module), etc., etc. Picking between these is an implementation trade-off, not a language-level semantics trade-off -- from the point of view of the user API, they're pretty much interchangeable. ...And in principle you could also use any of these options to implement the "explicit interleaving" approach. For example, each coroutine could get assigned its own OS level thread, and then to get the 'await' semantics you could have a shared global lock that gets dropped when entering an 'await' and then re-acquired afterwards. This would be silly and inefficient compared to what asyncio actually does (it uses a single thread, like gevent), so no-one would do this. But my point is that at the user API level, again, these are just implementation details -- this would be a valid way to implement the async/await semantics. So what can we conclude from all this? First, if your goal is to write code that gets faster when you add more CPU cores, then that means you're looking for a particular implementation strategy: you want OS level threads, and no GIL. One way to do this would be to keep the Python language semantics the same, while modifying CPython's implementation to remove the GIL. This turns out to be really hard :-). But Jython demonstrates that the existing APIs are sufficient to make it possible -- the difficulties are in the CPython implementation, not in the language, so that's where it would need to be fixed. If someone wants to push this forward probably the thing to do is to see how Larry's "gilectomy" project is doing and help it along. Another strategy would be to come up with some new user API that can be added to the language, and whose semantics are more amenable to no-GIL-multithreading. There are lots of somewhat nascent ideas out there -- IIRC Eric's been thinking about using subinterpreters to add shared-nothing threads (versus the shared-everything threads which Python currently supports -- shared nothing is what Erlang does), there's Armin's experiments with STM in PyPy, there's PyParallel, etc. Nick has a good summary: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python... But -- and this is the main point I've been leading up to -- async/await is *not* the new user-level API that you're looking for. Async/await were created to enable the "explicitly interleaved" style of programming, which as we saw above effectively takes the GIL and promotes it to becoming an explicit part of the user API, instead of an implementation detail of the runtime. This is the one and only reason async/await exist -- if you don't want to explicitly control where your code can switch "threads" and be guaranteed that no other code is running at the same time, then there is no reason to use async/await. So I think the objection to async/await on the grounds that they clutter up the code is based on a misunderstanding of what they're for. It wasn't that we created these keywords to solve some implementation problem and then inflicted them on users. It's exactly the other way around. *If* you as a user want to add some explicit annotations to your code to control how parallel execution can be interleaved, *then* there has to be some keywords to write those annotations, and that's what async/await are. And OTOH if you *don't* want to have markers in your code to explicitly control interleaving -- if you prefer the "implicit interleaving" style -- then async/await are irrelevant and you shouldn't use them, you should use threading/gevent/whatever. -n -- Nathaniel J. Smith -- https://vorpus.org