
Having followed Yury Selivanov yselivanov.ml at gmail.com <http://gmail.com/> proposal to add async/await to Python (PEP 492 Coroutines with async and await syntax and (PEP 525 Asynchronous Generators) and and especially the discussion about PEP 530: Asynchronous Comprehensions I would like to add some concerns about the direction Python is taking on this. As Sven R. Kunze srkunze at mail.de <http://mail.de/> mentions the is a risk of having to double a lot of methods/functions to have an Async implementation. Just look at the mess in .NET when Microsoft introduced async/await in their library, a huge number of functions had to be implemented with a Async version of each member. Definitely not the DRY principle. While I think parallelism and concurrency are very important features in a language, I feel the direction Python is taking right now is getting to complicated, being difficult to understand and implement correct. I thought it might be worth to look at using async at a higher level. Instead of making methods, generators and lists async, why not make the object itself async? Meaning that the method call (message to object) is async Example: class SomeClass(object): def some_method(self): return 42 o = async SomeClass() # Indicating that the user want’s an async version of the object r = o.some_method() # Will implicit be a async/await “wrapped” method no matter impl. # Here other code could execute, until the result (r) is referenced print r I think above code is easier to implement, use and understand, while it handles some of the use cases handled by defining a lot of methods as async/await. I have made a small implementation called PYWORKS (https://github.com/pylots/pyworks <https://github.com/pylots/pyworks>), somewhat based on the idea above. PYWORKS has been used in several real world implementation and seams to be fairly easy for developers to understand and use. br /Rene PS. This is my first post to python-ideas, please be gentle :-)

Independently from what the proposed solution is, I think you raised a very valid concern: the DRY principle. Right now the stdlib has tons of client network libraries which do not support the new async model. As such, library vendors will have to rewrite them by using the new syntax and provide an "aftplib", "ahttplib" etc. and release them as third-party libs hosted on PYPI. This trend is already happening as we speak: https://github.com/python/asyncio/wiki/ThirdParty#clients It would be awesome if somehow the Python stdlib itself would provide some mechanism to make the existent "batteries" able to run asynchronously so that, say, ftplib or httplib can be used with asyncio as the base IO loop and at the same time maintain the same existent API. Gevent tried to do the same thing with http://www.gevent.org/gevent.monkey.html As for *how* to do that, I'm sorry to say that I really have no idea. It's a complicated issue, but I think it's good that this has been raised. On Sun, Oct 2, 2016 at 3:26 PM, Rene Nejsum <rene@stranden.com> wrote:
-- Giampaolo - http://grodola.blogspot.com

On 3 October 2016 at 15:52, Giampaolo Rodola' <g.rodola@gmail.com> wrote:
There's https://sans-io.readthedocs.io/ which proposes an approach to solving this issue. Paul

The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. This is made clear and readable through the use of `await` keywords. Your proposal unfortunately goes directly against this idea of explicitness. You won't know what function will need to be fed into an event loop or not. You won't know where your code is going to lose or gain control. On Sun, Oct 2, 2016, 14:26 Rene Nejsum <rene@stranden.com> wrote:
-- Yann Kaiser kaiser.yann@gmail.com yann.kaiser@efrei.net +33 6 51 64 01 89 https://github.com/epsy

On Tue, Oct 4, 2016 at 8:32 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
That keeps getting talked about, but one thing I've never seen is any sort of benchmark showing (probably per operating system) how many concurrent requests you need to have before threads become unworkable. Maybe calculate it in milli-Wikipedias, on the basis that English Wikipedia is a well-known site and has some solid stats published [1]. Of late, it's been seeing about 8e9 hits per month, or about 3000/second. So one millipedia would be three page requests per second. A single-core CPU running a simple and naive Python web application can probably handle several millipedias without any concurrency at all. (I would define "handle" as "respond to requests without visible queueing", which would mean about 50ms, for a guess - could maybe go to 100 or even 250, but then people might start noticing the slowness.) Once you're handling more requests than you can handle on a single thread, you need concurrency, but threading will do you fine for a while. Then at some further mark, threading is no longer sufficient, and you need something lighter-weight, such as asyncio. But has anyone ever measured what those two points are? ChrisA [1] http://reportcard.wmflabs.org/

My implementation is, but it should not (have to) be, it only reflects my limited ability and time :-) The programmer should not need to be aware of where concurrency is achieved though coroutines or threads, ideally there should be one OS thread per core in the CPU running many (millions) of coroutines… br /Rene

Hi Yann/
On 03 Oct 2016, at 17:46, Yann Kaiser <kaiser.yann@gmail.com> wrote:
The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. This is made clear and readable through the use of `await` keywords.
The way I read this argument, a parallel could be “the great thing about alloc/free is that it is explicit about when allocation will happen”, but I believe that the more control you can leave to the runtime the better.
Your proposal unfortunately goes directly against this idea of explicitness. You won't know what function will need to be fed into an event loop or not. You won't know where your code is going to lose or gain control.
I believe that you should be able to code concurrent code, without being to explicit about it, but let the runtime handle low-level timing, as long as you know your code will execute in the intended order. br /Rene

On Oct 3, 2016 7:09 PM, "Stephen J. Turnbull" < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
They are referring to the synchronous nature of any independent control state. Whether it's a thread, a coroutine, a continuation, or whatever else doesn't really matter much. When a thing runs concurrently along side other things, it's still synchronous with respect to itself regardless of how many context switches occur before completion. Such things only need mechanisms to synchronize in order to cooperate. People want to know how they are suppose to write unified, non-insane-and-ugly code in this a/sync python 2/3 world we now find ourselves in. I've been eagerly watching this thread for the answer, thus far to no avail. Sans-io suggests we write bite-sized synchronous code that can be driven by a/sync consumers. While this is all well and good, how does one write said consuming library for both I/O styles without duplication? The answer seems to be "write everything you ever wanted as async and throw some sync wrappers around it". Which means all the actual code I write will be peppered with async and await keywords. In Go I can spawn a new control state (goroutine) at any time against any function. This is clear in the code. In Erlang I can spawn a new control state (Erlang process) at any time and it's also clear. Erlang is a little different because it will preempt me, but the point is I am simply choosing a target function to run in a new context. Gevent and even threading module is another example of this pattern. In all reality you don't typically need many suspension points other than around I/O, and occasionally heavy CPU, so I think folks are struggling to understand (I admit, myself included) why the runtime doesn't want to be more help and instead punts back to the developer. -- C Anthony

On 4 October 2016 at 10:48, C Anthony Risinger <anthony@xtfx.me> wrote:
Right, this thread is more about "imperative shell, asynchronous execution", than it is event driven servers. http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html and the code at https://bitbucket.org/ncoghlan/misc/src/default/tinkering/background_tasks.p... gives an example of doing that with "schedule_coroutine", "run_in_foreground" and "call_in_background" helpers to drive the event loop.
Because the asynchronous features are mostly developed by folks working on event driven servers, and the existing synchronous APIs are generally fine if you're running from a synchronous shell. That leads to the following calling models being reasonably well-standardised: - non-blocking synchronous from anywhere: just call it - blocking synchronous from synchronous: just call it - asynchronous from asynchronous: use await - blocking synchronous from asynchronous: use "loop.run_in_executor()" on the event loop The main arguable aspect there is "loop.run_in_executor()" being part of the main user facing API, rather than offering a module level `asyncio.call_in_background` helper function. What's not well-defined are the interfaces for calling into asynchronous code from synchronous code. The most transparent interface for that is gevent and the underlying greenlet support, which implement that at the C stack layer, allowing arbitrary threads to be suspended at arbitrary points. This doesn't give you any programming model benefits, it's just a lighter weight form of operating system level pre-emptive threading (see http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programm... for more on that point). The next most transparent would be to offer a more POSIX-like shell experience, with the concepts of foreground and background jobs, and the constraint that the background jobs scheduled in the current thread only run while a foreground task is active. As far as I know, the main problems that can currently arise with that latter approach are when you attempt to run something in the foreground, but the event loop in the current thread is already running. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
What's not well-defined are the interfaces for calling into asynchronous code from synchronous code.
I don't understand the relevance to the content of the thread. As I understand the main point, Sven and Rene don't believe that [the kind of] async code [they want to write] should need any keywords; just start the event loop and invoke functions, and that somehow automatically DTRTs. (I.e., AFAICS the intent is to unify generators and coroutines despite the insistence of Those Who Have Actually Implemented Stuff that generator != coroutine.) N.B. As I understand it, although Rene uses the async keyword when invoking the constructor, this could be just as well done with a factory function since he speaks of "wrapping" the object. And his example is in your "just call it" category: nonblocking synchronous code. That doesn't help me understand what he's really trying to do. His PyWorks project is documented as implementing the "actor" model, but async is more general than that AFAICS, and on the other hand I can't see how you can guarantee that a Python function won't modify global state. So OK, I can see that a performant implementation of the actor pattern (don't we have this in multiprocessing somewhere?) with a nice API (that's harder :-) and documented restrictions on what you can do in there might be a candidate for stdlib, but I don't see how it's related to the "async(io)" series of PEPs, which are specifically about interleaving arbitrary amounts of suspension in a Python program (which might manipulate global state, but we want to do it in a way such that we know that code between suspension points executes "atomically" from the point of view of other coroutines). Anthony also objects to the keywords, ie, that he'll need to pepper his "dual-purpose" code with "async" and "await". Again, AFAICS that implies that he doesn't see a need to distinguish async from lazy (coroutine from generator), since AFAICS you'd break the world if you changed the semantics of "def foo" to "async def foo". So if you're going to write async/await-style code, you're going to have to swallow the keywords. Am I just missing something?

On 4 October 2016 at 17:50, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Given the schedule_coroutine/run_in_foreground distinction, it's relatively easy (for a given definition of easy) to write a proxy object that would make the following work: class SomeClass(object): def some_sync_method(self): return 42 async def some_async_method(self): await asyncio.sleep(3) return 42 o = auto_schedule(SomeClass()) # Indicating that the user wants an async version of the object r1 = o.some_sync_method() # Automatically run in a background thread r2 = o.some_async_method() # Automatically scheduled as a coroutine print(run_in_foreground(r1)) print(run_in_foreground(r2)) It's not particularly useful for an actual event driven server, but it should be entirely do-able for the purposes of providing a common interface over blocking and non-blocking APIs. What it *doesn't* do, and what you need greenlet for, is making that common interface look like you're using plain old synchronous C threads. If folks really want to do that, that's fine - they just need to add gevent/greenlet as a dependency, just as the folks that don't like the visibly object-oriented nature of the default unittest and logging APIs will often opt for third party alternative APIs that share some of the same underlying infrastructure. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 4, 2016 at 4:30 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
So maybe r1 and r2 are just concurrent.futures.Futures, and run_in_foreground(r) wraps r.result(). And auto_schedule() is a proxy that turns all method calls into async calls with a (concurrent) Future to wait for the result. There's an event loop somewhere that sits idle except when you call run_in_foreground() on somethong; it's only used for the async methods, since the sync methods run in a background thread (pool, I hope). Or perhaps r2 is an asyncio.Future and run_in_foreground(r2) wraps loop.run_until_complete(r2). I suppose the event loop should also be activated when waiting for r1, so maybe r1 should be an asyncio Future that wraps a concurrent Future (using asyncio.wrap_future(), which can do just that thing). Honestly it feels like many things can go wrong with this API model, esp. you haven't answered what should happen when a method of SomeClass (either a synchronous one or an async one) calls run_in_foreground() on something -- or, more likely, calls some harmless-looking function that calls another harmless-looking function that calls run_in_foreground(). At that point you have pre-emptive scheduling back in play (or your coroutines may be blocked unnecessarily) and I think you have nothing except a more complicated API to work with threads. I think I am ready to offer a counterproposal where the event loop runs in one thread and synchronous code runs in another thread and we give the synchronous code a way to synchronously await a coroutine or an asyncio.Future. This can be based on asyncio.run_coroutine_threadsafe(), which takes a coroutine or an asyncio.Future and returns a concurrent Future. (It also takes a loop, and it assumes that loop runs in a different thread. I think it should assert that.) The main feature of my counterproposal as I see it is that async code should not call back into synchronous code, IOW once you are writing coroutines, you have to use the coroutine API for everything you do. And if something doesn't have a coroutine API, you run it in a background thread using loop.run_in_executor(). So either you buy into the async way of living and it's coroutines all the way down from there, no looking back -- or you stay on the safe side of the fence, and you interact with coroutines only using a very limited "remote manipulator" API. The two don't mix any better than that. -- --Guido van Rossum (python.org/~guido)

On 5 October 2016 at 02:15, Guido van Rossum <guido@python.org> wrote:
Yeah, that's the main reason I haven't gone beyond this as a toy idea - there are so many ways to get yourself in trouble if you don't already understand the internal details.
Oh, that makes a lot more sense, as we'd end up with a situation where async code gets used in one of two ways: - asynchronous main thread (the typical way it gets used now) - synchronous thread with a linked asynchronous helper thread The key differences between the latter and a traditional thread pool is that there'd only be the *one* helper thread for any given synchronous thread, and as long as the parent thread keeps its hands off any shared data structures while coroutines are running, you can still rely on async/await to interleave access to data structures shared by the coroutines.
+1 I considered suggesting that the "remote manipulator" API could be spelled "await expr", but after starting to write that idea up, realised it was likely a recipe for hard-to-debug problems when folks forget to add the "async" declaration to a coroutine definition. So that would instead suggest 2 module level functions in asyncio: * call_in_background(coroutine_or_callable, *args, **kwds): - creates the helper thread if it doesn't already exist, stores a reference in a thread local variable - schedules coroutines directly in the helper thread's event loop - schedules other callables in the helper thread's executor - returns an asyncio.Future instance - perhaps lets the EventLoopPolicy override this default behaviour? * wait_for_result: - blocking call that waits for asyncio.Future.result() to be ready Using "call_in_background" from a coroutine would be OK, but somewhat redundant (as if a coroutine is already running, you could just use the current thread's event loop instead). Using "wait_for_result" from a coroutine would be inappropriate, as with any other blocking call. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I am a little out on deep water here, but I think that if an object instance was guaranteed - by Python runtime - to run in one coroutine/thread and only the message passing of method call and return values was allowed to pass between coroutine/thread context, then at least all local instance variable reference would be fine?
Maybe not, but I am hoping for something better :-)

On 04.10.2016 13:30, Nick Coghlan wrote:
Maybe, this is all a big misunderstanding. asyncio is incompatible with regular execution flow and it's **always blocking**. However, asyncio is perceived by some of us (including me) as a shiny alternative to processes and threads but really isn't. I remember doing this survey on python-ideas (results here: https://srkunze.blogspot.de/2016/02/concurrency-in-python.html) but I get the feeling that we still miss something. My impression is that asyncio shall be used for something completely different than dropping off things into a background worker. But looking at the cooking example given by Steve Dower (cf. blog post), at other explanations, at examples in the PEPs, it just seems to me that his analogy could have been made with threads and processes as well. At its core (the ASYNC part), asyncio is quite similar to threads and processes. But its IO-part seem to drive some (design) decisions that don't go well with the existing mental model of many developers. Even PEP-reviewers are fooled by simple asyncio examples. Why? Because they forget to spawn an eventloop. "async def and await" are just useless without an eventloop. And maybe that's what's people frustration is about. They want the ASYNC part without worrying about the IO part. Furthermore, adding 2 (TWO) new keywords to a language has such an immense impact. Especially when those people are told "the barrier for new keywords is quite high!!". So, these new keywords must mean something. I think what would help here are concrete answers to: 0) Is asyncio a niche feature only be used for better IO? 1) What is the right way of integrating asyncio into existing code? 2) How do we intend to solve the DRY-principle issue? If the answer is "don't use asyncio", that's a fine result but honestly I think it would be just insane to assume that we got all these features, all this work and all those duplicated functions all for nothing. I can't believe that. So, I am still looking for a reasonable use-case of asyncio in our environment. Cheers, Sven

On 04.10.2016 09:50, Stephen J. Turnbull wrote:
I don't think that's actually what I wanted here. One simple keyword should have sufficed just like golang did. So, the developer gets a way to decide whether or not he needs it blocking or nonblocking **when using a function**. He doesn't need to decide it **when writing the function**. You might wonder why this is relevant. DRY principle has been mentioned but there's more to it. Only the caller **can decide** whether it needs to wait or not. Why? Because, the caller works WITH the result of the called function (whatever results means to you). The caller is (what Nick probably would call) the orchestrator, as it has the knowledge about the relation and interaction between domain-specific function calls. As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what René and Anthony objects about. Cheers, Sven

I agree, that’s why i proposed to put the async keyword in when creating the object, saying in this instance I want asynchronous communication with the object.
You might wonder why this is relevant. DRY principle has been mentioned but there's more to it. Only the caller **can decide** whether it needs to wait or not. Why? Because, the caller works WITH the result of the called function (whatever results means to you). The caller is (what Nick probably would call) the orchestrator, as it has the knowledge about the relation and interaction between domain-specific function calls.
+1
As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what René and Anthony objects about.
I had a look at xfork, and really like it. It is implemented much like the lower level of PYWORKS and PYWORKS could build on xfork instead. I think that the “model” of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) . I the ideal case it should be up to the runtime implementation (CPython, PyPy, Jython, IronPython etc.) how the asynchronous behaviour is implemented (greenlets, threads, roll-it-own, etc) br /Rene

Rene Nejsum writes:
On 04 Oct 2016, at 18:40, Sven R. Kunze <srkunze@mail.de> wrote:
I don't believe it's true, but suppose it is. *You don't need syntactic support* (a keyword) for it. Do you? It can all be done conveniently and readably with functions, as you have proved yourself with pyworks and Sven has with xfork, not to forget greenlets and gevent. No? You could argue that coroutines don't require syntax (keywords) either, but some Very Smart People disagree. I don't understand PEP 492's implementation well, but pretty clearly there are blockers to allowing ordinary __next__ methods doing async calls. There's also the issue mentioned in PEP 3153 that generators don't fit the notion of (self-actuated) producers "pushing" values into other code; they're really about having values pulled out of them. So PEPs 3156 and 492 are actual extensions to Python's capabilities for compact, readable expression of [a specific idiom/model of] asynchronous execution. They aren't intended for all possible models, just to help with one that is important to a fairly large class of Python programmers.
I think that the model of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) .
Why be restrictive? Python already supports many models of concurrency, pretty much filling the space (parallel execution vs. coroutines, shared-state vs. isolated, cooperative vs. preemptive, perhaps there are other dimensions). Why go backward from where we already are?

On 5 October 2016 at 16:49, Rene Nejsum <rene@stranden.com> wrote:
OK, I think there may be a piece of foundational knowledge regarding runtime design that's contributing to the confusion here. Python's core runtime model is the C runtime model: threads (with a local stack and access to a global process heap) and processes (which contain a heap and one or more threads). Anything else we do (whether it's generators, coroutines, or some other form of paused execution like callback management) gets layered on top of that runtime model. When folks ask questions like "Why can't Python be more like Go?", "Why can't Python be more like Erlang?", or "Why can't Python be more like Rust?" and get a negative response, it's usually because there's an inherent conflict between the C runtime model and whatever piece of the Go/Erlang/Rust runtime model we want to steal. So the "async" keyword in "async def", "async for" and "async with" is essentially a marker saying "This is not a C-like runtime concept anymore!" (The closest C-ish equivalent I'm aware of would be Apple's Grand Central Dispatch in Objective-C and that shows many of the async/await characteristics also seen in Python and C#: https://www.raywenderlich.com/60749/grand-central-dispatch-in-depth-part-1 ) Go (as with Erlang before it) avoided these problems by not providing C-equivalent functions in the first place. Accordingly, *every* normal function defined in Go can also be used as a goroutine, rather than needing to be a distinct type - their special case is defining functions that interoperate with external C libraries. Python (along with other languages built on the C runtime model like C# and Objective-C) doesn't have that luxury - we need to distinguish coroutines from regular functions, since we can't just handle them according to the underlying C runtime model any more. Guido's idea of a shadow thread to let synchronous threads run coroutines without needing to actually run a foreground event loop should provide a manageable way of getting the two runtime models (traditional C and asynchronous coroutines) to play nicely together in a single application, and has the virtue of being something folks can readily experiment with for themselves before we commit to anything specific in the standard library (since all the building blocks of thread local storage, event loop management, and inter-thread message passing primitives are already available). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 05.10.2016 18:06, Nick Coghlan wrote:
[runtime matters]
I think I understand your point. I also hope that others and me could provide you with our perspective. We see Python not as a C-like runtime but as an abstract modelling language. I know that it's different from the point of view of CPython internals, however from the outside Python suggests to be much more than a simple wrapper around C. Just two different perspectives. Unfortunately, your runtime explanations still don't address the DRY issue. :-/
I needed to think about this further when Guido mentioned it. But I like it now. If you check https://github.com/srkunze/fork/tree/asyncio , I already started working on integrating asyncio into xfork at long time ago. But I still couldn't wrap my mind around it and it stalled. But IIRC, I would have implemented a shadow thread solution as well. So, if his idea goes into the stdlib first, I welcome it even more as it would do the heavy lifting for me. xfork would then be just a common interface to threads, processes and coroutines. Cheers, Sven

On 10/05/2016 12:20 PM, Sven R. Kunze wrote:
On 05.10.2016 18:06, Nick Coghlan wrote:
At this point I'm willing to bet that you (Sven) are closest to actually having a shadow thread thingy that actually works. Maybe some other asyncio folks would be willing to help you develop it? -- ~Ethan~

Excellent point. For me CPython, Jython, IronPython, PyPy are the same (99.9%) and the important part is the Python the language. For a long time I tested PYWORKS again all implementations and were happy that it ran on all. Clearly, for others CPython (incl. runtime and C-bindings) is the fact and the others are far from the same, especially because the missing C-integration. But, are the runtimes for Python and Erlang that fundamentally different? Is it Python’s tight integration with C that is the big difference? When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto.
xfork (as pyworks) implements a proxy object, which “almost” behaves like the real object, but it is still a proxy. If fork (or spawn, chan, async, whatever.) was a part of the language it would be more clean. br /Rene
Cheers, Sven

On 5 October 2016 at 21:28, Rene Nejsum <rene@stranden.com> wrote:
But, are the runtimes for Python and Erlang that fundamentally different? Is it Python’s tight integration with C that is the big difference?
I don't know *that* much about Erlang, but Python's model is that of a single shared address space with (potentially multiple) threads of code running, having access to that address space. Erlang's model is that of multiple threads of execution (processes) that are isolated from each other (they have independent address spaces). That's a pretty fundamental difference, and gets right to the heart of why async is fundamentally different in the two languages. It also shows in Erlang's C FFI, which as I understand it is to have the C code isolated in a separate "process", and the user's program communicating with it through channels. As far as I can see, that's a direct consequence of the fact that you couldn't safely expect to call a C function (with its direct access to the whole address space) direct from an Erlang process. Python's model is very similar to C (and Java, and C#/.net, and many other "traditional" languages [1]). That's not "to make it easier to call C functions", it's just because it was a familiar and obvious model to use, known to work well, when Python was first developed. The fact that it made calling C from Python easy was a side effect - one that helped make Python as useful and popular as it is today, but nevertheless simply a side effect of the model. Paul [1] And actual computer hardware, which isn't a coincidence :-)

Paul Moore wrote:
I don't know much about Erlang either, but from what I gather, it's a functional language. That removes a lot of potential problems with concurrency right from the beginning. You can't have trouble with mutation of shared state if you can't mutate state in the first place. :-) -- Greg

On Wed, Oct 5, 2016 at 1:28 PM, Rene Nejsum <rene@stranden.com> wrote:
When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto.
I think there's two fundamentally different layers getting conflated here, which is really confusing the issue. Layer 1 is the user API for concurrency. At this layer, there are two major options in current Python. The first option is the "implicit interleaving" model provided by classic threads, stackless, gevent, goroutines, etc., where as a user you write regular "serial" code + some calls to thread spawning primitives, and then the runtime magically arranges for multiple pieces of "serial" code to run in some kind of concurrent/parallel fashion. One downside of this approach is that because the runtime gets to arbitrarily decide how to interleave the execution of these different pieces of code, it can be difficult for the user to reason about interactions between them. So this motivated the second option for user APIs: the "explicit interleaving" model where as a user you annotate your code with some sort of marker saying where it's willing to be suspended (Python uses the "await" keyword), and then the runtime is restricted to only running one piece of code at a time, and only switching between them at these explicitly marked points. (The canonical reference on this is https://glyph.twistedmatrix.com/2014/02/unyielding.html) (I like to think about this as opt-out concurrency vs opt-in concurrency: the first model is concurrent by default except where you explicitly use a mutex; the second is serial by default except where you explicitly use "await".) So that's the user API level. Then there's Layer 2, the strategies that the runtime underneath uses to implement whichever semantics are in play. There are a lot of options here -- in particular, within the "implicit interleaving" model Python has existing production-ready implementations using OS level threads with a GIL (CPython's threading module), clever C stack manipulation tricks on a single OS level thread (gevent), OS level threads without a GIL (Jython's threading module), etc., etc. Picking between these is an implementation trade-off, not a language-level semantics trade-off -- from the point of view of the user API, they're pretty much interchangeable. ...And in principle you could also use any of these options to implement the "explicit interleaving" approach. For example, each coroutine could get assigned its own OS level thread, and then to get the 'await' semantics you could have a shared global lock that gets dropped when entering an 'await' and then re-acquired afterwards. This would be silly and inefficient compared to what asyncio actually does (it uses a single thread, like gevent), so no-one would do this. But my point is that at the user API level, again, these are just implementation details -- this would be a valid way to implement the async/await semantics. So what can we conclude from all this? First, if your goal is to write code that gets faster when you add more CPU cores, then that means you're looking for a particular implementation strategy: you want OS level threads, and no GIL. One way to do this would be to keep the Python language semantics the same, while modifying CPython's implementation to remove the GIL. This turns out to be really hard :-). But Jython demonstrates that the existing APIs are sufficient to make it possible -- the difficulties are in the CPython implementation, not in the language, so that's where it would need to be fixed. If someone wants to push this forward probably the thing to do is to see how Larry's "gilectomy" project is doing and help it along. Another strategy would be to come up with some new user API that can be added to the language, and whose semantics are more amenable to no-GIL-multithreading. There are lots of somewhat nascent ideas out there -- IIRC Eric's been thinking about using subinterpreters to add shared-nothing threads (versus the shared-everything threads which Python currently supports -- shared nothing is what Erlang does), there's Armin's experiments with STM in PyPy, there's PyParallel, etc. Nick has a good summary: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python... But -- and this is the main point I've been leading up to -- async/await is *not* the new user-level API that you're looking for. Async/await were created to enable the "explicitly interleaved" style of programming, which as we saw above effectively takes the GIL and promotes it to becoming an explicit part of the user API, instead of an implementation detail of the runtime. This is the one and only reason async/await exist -- if you don't want to explicitly control where your code can switch "threads" and be guaranteed that no other code is running at the same time, then there is no reason to use async/await. So I think the objection to async/await on the grounds that they clutter up the code is based on a misunderstanding of what they're for. It wasn't that we created these keywords to solve some implementation problem and then inflicted them on users. It's exactly the other way around. *If* you as a user want to add some explicit annotations to your code to control how parallel execution can be interleaved, *then* there has to be some keywords to write those annotations, and that's what async/await are. And OTOH if you *don't* want to have markers in your code to explicitly control interleaving -- if you prefer the "implicit interleaving" style -- then async/await are irrelevant and you shouldn't use them, you should use threading/gevent/whatever. -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel Smith wrote:
It wasn't that we created these keywords to solve some implementation problem and then inflicted them on users.
I disagree -- looking at the history of how we ended up with async/await, it looks to me like this is exactly what *did* happen. First we had generators. Then 'yield from' was invented to (among other things) leverage them as a way of getting lightweight threads. Then 'await' was introduced as a nicer way to spell 'yield from' when using it for that purpose. Saying that 'await' is good for you because it makes the suspension points visible seems to me a rationalisation after the fact. It was something that emerged from the implementation, not a prior design requirement. -- Greg

On Thu, Oct 6, 2016 at 12:45 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I wasn't trying to write a detailed account of the development, as much as try to capture some essential features. Myth, not history :-). In the final design, the one and only thing that distinguishes async/await from gevent is that in the former the suspension points are visible, and in the latter they aren't. I don't really believe that it's an accident that people put a lot of effort into creating async/await in this way at a time when gevent already existed and was widely used in production, and we have historical documents like Glyph's blog arguing for visible yield points as a motivation for async/await, but... even if you think it *was* an accident, it hardly matters at this point. The core distinguishing feature between async/await and gevent is the visibility of suspension points, so it might as well be the case that async/await is designed for exactly those people who want visible suspension points. (And I didn't say await or visible suspension points are necessarily "good for you" -- obviously the implicit and explicit interleaving approaches have trade-offs you'll have to judge for yourself. But there are some people in some situations who want implicit interleaving and async/await is there for them.) -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel Smith wrote:
They're not quite independent axes, though. Gevent is based on greenlet, which relies on some slightly dubious tricks at the C level and doesn't play well with some external libraries. As far as I know, there's no current alternative that's just as efficient and portable as asyncio but without the extra keywords. If you want the full benefits of asyncio, you're forced to accept explicit suspension points. -- Greg

On Thu, Oct 6, 2016 at 4:12 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd be interested to hear more about this. gevent/greenlet don't seem to have an official "list of supported platforms" that I can find, but I can't find concrete examples of unsupported platforms either. Are we talking like, HPUX-on-MIPS or...? And obviously there are always going to be some cases that are better supported by either one tool or another, but as we've seen getting external libraries to play well with asyncio is also pretty non-trivial (exactly because of those explicit suspension points!), and my impression was that for now gevent actually had a larger ecosystem. For folks who prefer the gevent API, is it really easier to port libraries to asyncio than to port them to gevent? -n -- Nathaniel J. Smith -- https://vorpus.org

On 7 October 2016 at 16:42, Nathaniel Smith <njs@pobox.com> wrote:
It's definitely *not* easier, as gevent lets you suspend execution inside arbitrary CPython magic method calls. That's why you can still use SQL Alchemy's ORM layer with gevent - greenlet can swap the stack even with the extra C call frames on there. If you're running in vanilla CPython (or recent non-Windows versions of PyPy2), on a relatively mainstream architecture like x86_64 or ARM, then gevent/greenlet will be fine as an applications synchronous/asyncrhonous bridge. However, if you're running in a context that embeds CPython inside a larger application (e.g. mod_wsgi inside Apache), then gevent's assumptions about how the C thread states are managed may be wrong, and hence you may be in for some "interesting" debugging sessions. The same goes for any library that implements callbacks that end up executing a greenlet switch when they weren't expecting it (e.g. while holding a threading lock - that will protect you from other OS threads, but not from other greenlets in the same thread) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I can speak to this. It’s been my professional experience with gevent that choosing to obtain concurrency by using gevent as opposed to explicit async was a trade-off: we replaced a large amount of drudge work in writing a codebase with async/await pervasively throughout it with a smaller amount of dramatically (10x to 100x times) more intellectually challenging debugging work when unstated assumptions regarding thread-safety and concurrent access were violated. For many developers these trade offs are sensible and reasonable, but we should all remember that there are costs and advantages of most kinds of runtime model. I’m happy to have a language that lets me do all of these things than one that chooses one for me and says “that ought to be good enough for everyone”. Cory

On 6 October 2016 at 17:45, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd say it emerged from most folks still not grasping generators-as-coroutines a decade after PEP 342, and asynchronous IO in general ~15 years after Twisted was first released. When a language usage pattern is supported for that long, but folks still don't grok how it might benefit them, you have a UX problem, and one of the ways to address it is to take the existing pattern and give it dedicated syntax, which is exactly what PEP 492 did. Dedicated syntax at least dramatically lowers the barrier to *recognition* of the coroutine design pattern when it's being used, and can help with explaining it as well (since the overlap with other concepts in the language becomes a hidden implementation detail rather than being an essential part of the user experience). The shadow thread idea will hopefully prove successful in addressing the last major rough spot in the UX, which is the ability to easily integrate asynchronous components into an otherwise synchronous application. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2016-10-06 13:50 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
That's my opinion as well. If I had to run asyncio coroutines from synchronous code, I'd probably take advantage of the Executor interface defined by concurrent.futures. Executors handle resource management through a context manager interface, which is a good way to start and clean after the shadow thread. Also, the submit method returns a concurrent.futures.Future, i.e. the standard for accessing an asynchronous result from synchronous code. Here's a simple implementation: https://gist.github.com/vxgmichel/d16e66d1107a369877f6ef7e646ac2e5 If this is not enough, (say one wants to write a synchronous API to an asynchronous library), then it simply is a matter of instantiating the executor once in the module and wrap all the coroutines to expose with executor.submit and Future.result. This might provide an acceptable answer to the DRY thing that has been mentioned a few times, though I'm not convinced it is such a problematic issue (at least nothing that sans-io already addresses in the first place).

Nick Coghlan wrote:
However, it was just replacing one way of explicitly marking suspension points ("yield from") with another ("await"). The fact that suspension points are explicitly marked was driven by the implementation from the beginning. When I first proposed "yield from" as an aid to using generators as coroutines, my intention was always to eventually replace it with something else. PEP 3152 was my proposal for what the something else might be. I initially regarded it as a wart that it still required a special syntax for suspendable calls, and felt the need to apologise for that. I was totally surprised when people said they actually *liked* the idea of explicit suspension points. -- Greg

On 6 October 2016 at 05:20, Sven R. Kunze <srkunze@mail.de> wrote:
It's not a question that's up for debate - as a point of factual history, Python's runtime model is anchored in the C runtime model, and this pervades the entire language design. Simply wishing that Python's core runtime design was other than it is doesn't make it so. We can diverge from that base model when we decide there's sufficient benefit in doing so (e.g. the object model, the import system, the numeric tower, exception handling, lexical closures, generators, generators-as-coroutines, context management, native coroutines), but whatever we decide to do still needs to be expressible in terms of underlying operating system provided C primitives, or CPython can't implement it (and if CPython can't implement a feature as the reference implementation, that feature can't become part of the language definition). Postponing the point at which folks are confronted by those underlying C-level constraints is often an admirable goal, though - the only thing that isn't possible without fundamentally changing the language is getting rid of them entirely. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2016-10-06 03:27, Nick Coghlan wrote:
That may be true, but the limitation there is Python's core runtime model, not C's. As you say, Python's runtime model is historically anchored in C, but that doesn't mean C's runtime model itself directly constrains Python's. As others have mentioned, there are plenty of other languages that are themselves written in C but have different runtime models. The constraint is not compatibility with the C runtime model, but backward compatibility with Python's own earlier decisions about its own runtime model. This may sound like an academic point, but I just want to mention it because, as you say later, hiding C from the Python programmer is often an admirable goal. I would go so far as to say it is almost always an admirable goal. The Python runtime isn't going to suddenly change, but we can make smart decisions about incremental changes in a way that, over time, allows it to drift further from the C model, rather than adding more and more tethers linking it more tightly to the C model.
Sure. But over the long term, almost anything is possible. As I said above, my own opinion is that hiding C from Python users is almost always a good thing. I (and I think many other people) use Python because I like Python. If I liked C I would use C. To the extent that Python allows C to constrain it (or, more specifically, allows the nature of C to constrain people who are only writing Python code), it limits its ability to evolve in a way that frees users from the things they don't like about C. This is kind of tangential to the current issue about async. To be honest I am quite ignorant of how async/await will help or hurt me as a Python user. As you say, certain constraints are unavoidable. (We don't have to use C's runtime model, but we do have to be able to write our runtime model in C.) But I think it's good, when thinking about these features, to think how they will constrain future language development versus opening it up. If, for instance, people start using async/await and old-school generator-send-style coroutines become unused, it will be easier to deprecate generator-send in the distant future. On the flip side, I would hate to see decisions made that result in lots of Python code that "bakes in" specific runtime model assumptions, making it more difficult to leave those assumptions behind in the future. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Nick Coghlan writes:
How can there be a conflict between Python implementing the C runtime model *itself* which says "you can do anything anywhere anytime", and some part of Python implementing the more restricted models that allow safe concurrency? If you can do anything, well, you can voluntarily submit to compiler discipline to a restricted set. No? So it must be that the existing constructions (functions, for, with) that need an "async" marker have an implementation that is itself unsafe. This need is not being explained very well. What is also not being explained is what would be lost by simply using the "safe" implementations generated by the async versions everywhere. These may be hard to explain, and I know you, Yury, and Guido are very busy. But it's frustrating for all to see this go around in a circle: "it's like it is because it has to be that way, so that's the way it is". There's also the question of "is async/await really a language feature, or is it patching up a deficiency in the CPython implementation that other implementations don't necessarily have?" (which has been brought up before, in less contentious terms).
That's understood, of course. The question that isn't being answered well is "why can't that non-C-like runtime concept be like Go or Erlang or Rust?" Or, less obtusely, "what exactly is the 'async' runtime concept, and why is it preferred to the concepts implemented by Go or Erlang or Rust or gevent or greenlets or Stackless?" I guess the answer to "why not Stackless?" is buried in the archives for Python-Dev somewhere, but I need to get back to $DAYJOB, maybe I'll look it up later.

Agree, well put. The Erlang runtime (VM) is also written in C, so anything should be possible. I do not advocate that Python should be a “new” Erlang or Go, just saying that since we are introducing some level of concurrency in Python that we look at some of the elegant ways others have achieved this and try to implement something like that in Python.
I understand that there is a lot of backwards compatibility, especially in regards to the Python/C interface, but I think that it is possible to find an elegant solution to this.
This would be very interesting to understand.
I will try to look for that, I have some time on my hands, not sure I have have the %BRAINSKILL, but never the less… br /Rene

On 6 October 2016 at 15:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Anything is possible in C, but not everything is readily supportable :) When you design a new language and runtime from scratch, you get to set new rules and expectations if you want to do that. Ericsson did it with Erlang and BEAM (the reference Erlang VM) by declaring "Everything's an Actor in the 'Actor Model' sense, and Actors can send messages to each other's mailboxes". That pushes you heavily towards application designs where each "process" is a Finite State Machine with state changes triggered by external events, or by messages from other processes. If BEAM had been published as open source a decade earlier than it eventually was, I suspect the modern computing landscape would look quite different from the way it does today. Google did something similar with Golang and goroutines by declaring that Communicating Sequential Processes would be their core concurrency primitive rather than C's shared memory threading. By contrast, Python, C++, Java, C#, Objective-C all retained C's core thread-based "private stack, shared heap" concurrency model, which later expanded to also include thread local heap storage. Rust actually retains this core "private stack, private heap, shared heap" model, but changes the management of data ownership to avoid the messy problems that arise in practice when using the "everything is accessible to every thread by default" model.
Correct (for a given definition of unsafe): in normal operation, CPython uses the *C stack* to manage the Python frame stack, so when you descend into a new function call in CPython, you're also using up more C level stack space. This means that when CPython throws RecursionError, what it's actually aiming to prevent is a C level segfault arising from running out of stack space to manage frames: $ ./python -X faulthandler Python 3.6.0b1+ (3.6:b995b1f52975, Sep 22 2016, 01:19:04) [GCC 6.1.1 20160621 (Red Hat 6.1.1-3)] on linux Type "help", "copyright", "credits" or "license" for more information.
Current thread 0x00007fe977a7c700 (most recent call first): File "<stdin>", line 1 in f File "<stdin>", line 1 in f File "<stdin>", line 1 in f [<manual snip>] ... Segmentation fault (core dumped) Loops, with statements and other magic method invocations all work that way - they make a C level call to the magic method implementation which may end up running a new invocation of the eval loop to evaluate the bytecode of a magic method implementation that's written in Python. The pay-off that CPython gets from this is that we get to delegate 99.9% of the work for supporting different CPU architectures to C compiler developers, and we get a lot of capabilities "for free" when it comes to stack management. The downside is that C runtimes don't officially support swapping out the stack of the current thread with new contents. It's *possible* to do that (hence Stackless and gevent), but you're on your own when it comes to debugging it when it breaks. That makes it a good candidate for an opt-in "expert users only" capability - folks that decide gevent is the right answer for their needs can adopt it if they want to (perhaps restricting their choice of target platform and C extension modules as a result), while we (as in the CPython core devs) don't need to keep custom stack manipulation code working on all the platforms where CPython is supported and with all the custom C extension modules that are out there.
The two main problems with that idea are speed and extension module compatibility. The speed aspect is simply that we have more than 4 decades behind us of CPU designers and compiler developers making C code run fast. CPython uses that raw underlying speed to offer a lot of runtime flexibility with a relatively simple implementation while still being "fast enough" for many use cases. Even then, function calls are still notoriously slow, and await invocations tend to be slower still. The extension module compatibility problem is simply that whereas you can emulate a normal Python function just by writing a normal C function, emulating a Python coroutine involves implementing the coroutine protocol. That's possible, but it's a lot more complicated, and even if you implemented a standard wrapper, you'd be straight back to the speed problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
One of the main benefits is that it's very easy for external code to make callbacks to Python code. The original implementation of Stackless decoupled the eval stack from the C stack, but at the expense of making the API for calling external C code much less straightforward. -- Greg

On 2016-10-06 1:15 AM, Stephen J. Turnbull wrote:
To add to what Nick said. I myself would want to use a time machine to help design CPython runtime to allow Golang style jof concurrency (although Golang has its own bag of problems). Unfortunately there is no time machine, and implementing that in CPython today would be an impossibly hard and long task. To start, no matter how exactly you want to approach this, it would require us to do a *complete rewrite* of CPython internals. This is so complex that we wouldn't be able to even estimate how long it would take us. This would be a far more significant change than Python 2->3. BTW in the process of doing that, we would have to completely redesign the C API, which would effectively kill the entire numpy/scipy ecosystem. If someone disagrees with this, I invite them to go ahead and write a PEP (please!) On the other hand, async/await and non-blocking IO make it possible to write highly concurrent network applications. Even languages with good support of threading, such as C#, have async/await [sic!]. Even Rust users want them, and will likely add them in the language or std lib. Even C++ might have coroutines soon. Why? Because Rust and C# can't "just" implement actors model. Because threads are hard and deadlocks and code that is hard to reason about. Because threads can't scale as good as non-blocking IO. We probably could implement actors if we decided to merge Stackless or use greenlets in the core. Anyone who looked at/debugged the implementation of greenlets would say it's a bad idea. And gevent is available for those who want to use them anyways. In the end, async/await is the only *practical* solution for a language like Python. Yes, it's a bit harder to design libraries that support both synchronous and asynchronous APIs, but there's a way: separate your protocol parsing from IO. When done properly, it's easier to write unittests and it's a no-brainer to add support for different IO models. Yury

Regarding the Python C-runtime and async, I just had a good talk with Kresten Krab at Trifork. He implemented “Erjang” the Java implementation of the Erlang VM (www.erjang.org <http://www.erjang.org/>). Doing this he had access to the Erlang (C) VM. It turn’s out that the Erlang VM and the Python VM has a lot of similarities and the differences are more in the language, than in the VM Differences between the Erlang VM and Python related to async are: 1) Most variables in Erlang are immutable Making it easier to have coroutines 2) coroutines are built into the Erlang using the “spawn” keyword Leaving the specific implementation to the VM, but never implemented with OS threads. 3) All coroutines have their own heap and stack (initially 200 bytes), but can grow as needed 4) coroutines are managed in “ready-queue”, from which the VM thread executes the next ready job Each job gets 2000 “instructions” (or until IO block) and the next coroutine is executed Because of this, when multicore CPU’s entered the game, it was quite easy to change the Erlang VM to add a thread per core to pull from the ready-queue. This makes an Erlang program run twice as fast (almost) every time the number of cores are doubled! Given this, I am still convinced that: obj = async SomeObject() should be feasible, even though there will be some “golang” like issues about shared data, but there could be several ways to handle this. br /Rene

The problem is that if your goal is to make a practical proposal, it's not enough to look at Python-the-language. You're absolutely right, AFAICT there's nothing stopping someone from making a nice implementation of Python-the-language that has erlang-style cheap shared-nothing threads with some efficient message-passing mechanism. But! It turns out that unless your new implementation supports the CPython C API, then it's almost certainly not viable as a mainstream CPython alternative, because there's this huge huge pile of libraries that have been written against that C API. You're not competing against CPython, you're competing against CPython+thousands of libraries that you don't have and that your users expect. And unfortunately, it turns out that the C API locks in a bunch of the implementation assumptions (refcounting, the GIL, use of the C stack, poor support for isolation between different interpreter states, ...) that you were trying to get away from. I mean, in many ways it's a good problem to have, that our current ecosystem is just so attractive that it's hard to compete with! (Though a pessimist could point out that this difficulty with competing with yourself is exactly what tends to eventually undermine incumbents -- cf. the innovator's dilemma.) And it's "just" a matter of implementation, not Python-the-language itself. But the bottom line is: this is *the* core problem that you have to grapple with if you want to make any radical improvements in the Python runtime and have people actually use them. -n On Mon, Oct 17, 2016 at 9:36 AM, Rene Nejsum <rene@stranden.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

Your are right about the importance of Python C API, it often goes under my radar. For the past 20 years I have only used it a couple of times (to integrate Python into some existing C-code) therefore it is not as much in focus as it should be and definiatly are by others. I get your innovators dilemma all to well, just look at Python 3 and the time it took us to shift from 2. But, watching Larry Hastings talk on his awesome gilectomy project, it was my understanding that he at least saw it as a possibility to do a backward compatible extension of the C-API for his GIL removal project. As I understand he proposes that the Python runtime should check whether a given C-lib has been upgraded to support non-GIL, if not run it as an old version. I am not sure how much it will take in this case, but i thought “hey, if Larry Hastings is removing the GIL and proposing an extension to the C-api, at least it can be done” :-) /Rene

On 05.10.2016 08:49, Rene Nejsum wrote:
As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what René and Anthony objects about. I had a look at xfork, and really like it. It is implemented much like the lower level of PYWORKS and PYWORKS could build on xfork instead.
Thanks. :)
I think that the “model” of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) . I the ideal case it should be up to the runtime implementation (CPython, PyPy, Jython, IronPython etc.) how the asynchronous behaviour is implemented (greenlets, threads, roll-it-own, etc)
That's the way I see it as well. The Python language is extremely high-level. So, I guess in most cases, most people would just use the default implementation. Cheers, Sven

I agree 100%. Ideally I think a language (would love it to be Python) should permit many (millions) of what we know as coroutines and then have as many threads as the CPU have cores to execute this coroutines, but I do not thing you as a programmer should be especially aware of this as you code. (Just like GC handles your alloc/free, the runtime should handle your “concurrency”)
People want to know how they are suppose to write unified, non-insane-and-ugly code in this a/sync python 2/3 world we now find ourselves in. I've been eagerly watching this thread for the answer, thus far to no avail.
Agree
Sans-io suggests we write bite-sized synchronous code that can be driven by a/sync consumers. While this is all well and good, how does one write said consuming library for both I/O styles without duplication?
The answer seems to be "write everything you ever wanted as async and throw some sync wrappers around it". Which means all the actual code I write will be peppered with async and await keywords.
Have a look at the examples in David Beazley’s curio, he is one of the most knowable Python people I have met, but that code is almost impossible to read and understand.
In Go I can spawn a new control state (goroutine) at any time against any function. This is clear in the code. In Erlang I can spawn a new control state (Erlang process) at any time and it's also clear. Erlang is a little different because it will preempt me, but the point is I am simply choosing a target function to run in a new context. Gevent and even threading module is another example of this pattern.
Having thought some more about it, I think that putting async i front of the object, could be kind of a channel i Go and other languages?
In all reality you don't typically need many suspension points other than around I/O, and occasionally heavy CPU, so I think folks are struggling to understand (I admit, myself included) why the runtime doesn't want to be more help and instead punts back to the developer.
Well put, we are definitely on the same page here, thank you. br /Rene
--
C Anthony

On Mon, Oct 3, 2016 at 10:37 PM, Rene Nejsum <rene@stranden.com> wrote:
There's a problem with this model (of using all CPUs to run coroutines), since when you have two coroutines that can run in unspecified order but update the same datastructure, the current coroutine model *promises* that they will not run in parallel -- they may only alternate running if they use `await`. This promise implies that you can update the datastructure without worrying about locking as long as you don't use `await` in the middle. (IOW it's non-pre-emptive scheduling.) If you were to change the model to allow multiple coroutines being executed in parallel on multiple CPUs, such coroutines would have to use locks locks, and then you have all the problems of threading back in your coroutines! (There might be other things too, but there's no wait to avoid a fundamental change in the concurrency model.) Basically you're asking for Go's concurrency model -- it's nice in some ways, but asyncio wasn't made to do that, and I'm not planning to change it (let's wait for a GIL-free Python 4 first). I'm still trying to figure out my position on the other points of discussion here -- keep discussing! -- --Guido van Rossum (python.org/~guido)

Well, yes and no. I other languages (Java/C#) where I have implemented concurrent objects ala PYWORKS it works pretty well, as long as you have less than maybe 10.000 threads But, in Python (CPython2 on multicore CPU) threads does not work! The GIL makes it impossible to have for example 100 threads sending messages between each other (See the Ring example in PYWORKS), that’s one reason why it would be interesting to have some kind of concurrency support built into the Python runtime. Today I see all kinds of tricks and workarounds to get around the GIL. Raging from starting several Python interpreters to difficult to read code using yield (now async/await), but when you have seen much more elegant support (Go, Erlang, maybe even ABCL) you kind of wish this could be added to you own favourite language. br /Rene

Independently from what the proposed solution is, I think you raised a very valid concern: the DRY principle. Right now the stdlib has tons of client network libraries which do not support the new async model. As such, library vendors will have to rewrite them by using the new syntax and provide an "aftplib", "ahttplib" etc. and release them as third-party libs hosted on PYPI. This trend is already happening as we speak: https://github.com/python/asyncio/wiki/ThirdParty#clients It would be awesome if somehow the Python stdlib itself would provide some mechanism to make the existent "batteries" able to run asynchronously so that, say, ftplib or httplib can be used with asyncio as the base IO loop and at the same time maintain the same existent API. Gevent tried to do the same thing with http://www.gevent.org/gevent.monkey.html As for *how* to do that, I'm sorry to say that I really have no idea. It's a complicated issue, but I think it's good that this has been raised. On Sun, Oct 2, 2016 at 3:26 PM, Rene Nejsum <rene@stranden.com> wrote:
-- Giampaolo - http://grodola.blogspot.com

On 3 October 2016 at 15:52, Giampaolo Rodola' <g.rodola@gmail.com> wrote:
There's https://sans-io.readthedocs.io/ which proposes an approach to solving this issue. Paul

The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. This is made clear and readable through the use of `await` keywords. Your proposal unfortunately goes directly against this idea of explicitness. You won't know what function will need to be fed into an event loop or not. You won't know where your code is going to lose or gain control. On Sun, Oct 2, 2016, 14:26 Rene Nejsum <rene@stranden.com> wrote:
-- Yann Kaiser kaiser.yann@gmail.com yann.kaiser@efrei.net +33 6 51 64 01 89 https://github.com/epsy

On Tue, Oct 4, 2016 at 8:32 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
That keeps getting talked about, but one thing I've never seen is any sort of benchmark showing (probably per operating system) how many concurrent requests you need to have before threads become unworkable. Maybe calculate it in milli-Wikipedias, on the basis that English Wikipedia is a well-known site and has some solid stats published [1]. Of late, it's been seeing about 8e9 hits per month, or about 3000/second. So one millipedia would be three page requests per second. A single-core CPU running a simple and naive Python web application can probably handle several millipedias without any concurrency at all. (I would define "handle" as "respond to requests without visible queueing", which would mean about 50ms, for a guess - could maybe go to 100 or even 250, but then people might start noticing the slowness.) Once you're handling more requests than you can handle on a single thread, you need concurrency, but threading will do you fine for a while. Then at some further mark, threading is no longer sufficient, and you need something lighter-weight, such as asyncio. But has anyone ever measured what those two points are? ChrisA [1] http://reportcard.wmflabs.org/

My implementation is, but it should not (have to) be, it only reflects my limited ability and time :-) The programmer should not need to be aware of where concurrency is achieved though coroutines or threads, ideally there should be one OS thread per core in the CPU running many (millions) of coroutines… br /Rene

Hi Yann/
On 03 Oct 2016, at 17:46, Yann Kaiser <kaiser.yann@gmail.com> wrote:
The way I see it, the great thing about async/await as opposed to threading is that it is explicit about when execution will "take a break" from your function or resume into it. This is made clear and readable through the use of `await` keywords.
The way I read this argument, a parallel could be “the great thing about alloc/free is that it is explicit about when allocation will happen”, but I believe that the more control you can leave to the runtime the better.
Your proposal unfortunately goes directly against this idea of explicitness. You won't know what function will need to be fed into an event loop or not. You won't know where your code is going to lose or gain control.
I believe that you should be able to code concurrent code, without being to explicit about it, but let the runtime handle low-level timing, as long as you know your code will execute in the intended order. br /Rene

On Oct 3, 2016 7:09 PM, "Stephen J. Turnbull" < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
They are referring to the synchronous nature of any independent control state. Whether it's a thread, a coroutine, a continuation, or whatever else doesn't really matter much. When a thing runs concurrently along side other things, it's still synchronous with respect to itself regardless of how many context switches occur before completion. Such things only need mechanisms to synchronize in order to cooperate. People want to know how they are suppose to write unified, non-insane-and-ugly code in this a/sync python 2/3 world we now find ourselves in. I've been eagerly watching this thread for the answer, thus far to no avail. Sans-io suggests we write bite-sized synchronous code that can be driven by a/sync consumers. While this is all well and good, how does one write said consuming library for both I/O styles without duplication? The answer seems to be "write everything you ever wanted as async and throw some sync wrappers around it". Which means all the actual code I write will be peppered with async and await keywords. In Go I can spawn a new control state (goroutine) at any time against any function. This is clear in the code. In Erlang I can spawn a new control state (Erlang process) at any time and it's also clear. Erlang is a little different because it will preempt me, but the point is I am simply choosing a target function to run in a new context. Gevent and even threading module is another example of this pattern. In all reality you don't typically need many suspension points other than around I/O, and occasionally heavy CPU, so I think folks are struggling to understand (I admit, myself included) why the runtime doesn't want to be more help and instead punts back to the developer. -- C Anthony

On 4 October 2016 at 10:48, C Anthony Risinger <anthony@xtfx.me> wrote:
Right, this thread is more about "imperative shell, asynchronous execution", than it is event driven servers. http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html and the code at https://bitbucket.org/ncoghlan/misc/src/default/tinkering/background_tasks.p... gives an example of doing that with "schedule_coroutine", "run_in_foreground" and "call_in_background" helpers to drive the event loop.
Because the asynchronous features are mostly developed by folks working on event driven servers, and the existing synchronous APIs are generally fine if you're running from a synchronous shell. That leads to the following calling models being reasonably well-standardised: - non-blocking synchronous from anywhere: just call it - blocking synchronous from synchronous: just call it - asynchronous from asynchronous: use await - blocking synchronous from asynchronous: use "loop.run_in_executor()" on the event loop The main arguable aspect there is "loop.run_in_executor()" being part of the main user facing API, rather than offering a module level `asyncio.call_in_background` helper function. What's not well-defined are the interfaces for calling into asynchronous code from synchronous code. The most transparent interface for that is gevent and the underlying greenlet support, which implement that at the C stack layer, allowing arbitrary threads to be suspended at arbitrary points. This doesn't give you any programming model benefits, it's just a lighter weight form of operating system level pre-emptive threading (see http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programm... for more on that point). The next most transparent would be to offer a more POSIX-like shell experience, with the concepts of foreground and background jobs, and the constraint that the background jobs scheduled in the current thread only run while a foreground task is active. As far as I know, the main problems that can currently arise with that latter approach are when you attempt to run something in the foreground, but the event loop in the current thread is already running. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
What's not well-defined are the interfaces for calling into asynchronous code from synchronous code.
I don't understand the relevance to the content of the thread. As I understand the main point, Sven and Rene don't believe that [the kind of] async code [they want to write] should need any keywords; just start the event loop and invoke functions, and that somehow automatically DTRTs. (I.e., AFAICS the intent is to unify generators and coroutines despite the insistence of Those Who Have Actually Implemented Stuff that generator != coroutine.) N.B. As I understand it, although Rene uses the async keyword when invoking the constructor, this could be just as well done with a factory function since he speaks of "wrapping" the object. And his example is in your "just call it" category: nonblocking synchronous code. That doesn't help me understand what he's really trying to do. His PyWorks project is documented as implementing the "actor" model, but async is more general than that AFAICS, and on the other hand I can't see how you can guarantee that a Python function won't modify global state. So OK, I can see that a performant implementation of the actor pattern (don't we have this in multiprocessing somewhere?) with a nice API (that's harder :-) and documented restrictions on what you can do in there might be a candidate for stdlib, but I don't see how it's related to the "async(io)" series of PEPs, which are specifically about interleaving arbitrary amounts of suspension in a Python program (which might manipulate global state, but we want to do it in a way such that we know that code between suspension points executes "atomically" from the point of view of other coroutines). Anthony also objects to the keywords, ie, that he'll need to pepper his "dual-purpose" code with "async" and "await". Again, AFAICS that implies that he doesn't see a need to distinguish async from lazy (coroutine from generator), since AFAICS you'd break the world if you changed the semantics of "def foo" to "async def foo". So if you're going to write async/await-style code, you're going to have to swallow the keywords. Am I just missing something?

On 4 October 2016 at 17:50, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Given the schedule_coroutine/run_in_foreground distinction, it's relatively easy (for a given definition of easy) to write a proxy object that would make the following work: class SomeClass(object): def some_sync_method(self): return 42 async def some_async_method(self): await asyncio.sleep(3) return 42 o = auto_schedule(SomeClass()) # Indicating that the user wants an async version of the object r1 = o.some_sync_method() # Automatically run in a background thread r2 = o.some_async_method() # Automatically scheduled as a coroutine print(run_in_foreground(r1)) print(run_in_foreground(r2)) It's not particularly useful for an actual event driven server, but it should be entirely do-able for the purposes of providing a common interface over blocking and non-blocking APIs. What it *doesn't* do, and what you need greenlet for, is making that common interface look like you're using plain old synchronous C threads. If folks really want to do that, that's fine - they just need to add gevent/greenlet as a dependency, just as the folks that don't like the visibly object-oriented nature of the default unittest and logging APIs will often opt for third party alternative APIs that share some of the same underlying infrastructure. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 4, 2016 at 4:30 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
So maybe r1 and r2 are just concurrent.futures.Futures, and run_in_foreground(r) wraps r.result(). And auto_schedule() is a proxy that turns all method calls into async calls with a (concurrent) Future to wait for the result. There's an event loop somewhere that sits idle except when you call run_in_foreground() on somethong; it's only used for the async methods, since the sync methods run in a background thread (pool, I hope). Or perhaps r2 is an asyncio.Future and run_in_foreground(r2) wraps loop.run_until_complete(r2). I suppose the event loop should also be activated when waiting for r1, so maybe r1 should be an asyncio Future that wraps a concurrent Future (using asyncio.wrap_future(), which can do just that thing). Honestly it feels like many things can go wrong with this API model, esp. you haven't answered what should happen when a method of SomeClass (either a synchronous one or an async one) calls run_in_foreground() on something -- or, more likely, calls some harmless-looking function that calls another harmless-looking function that calls run_in_foreground(). At that point you have pre-emptive scheduling back in play (or your coroutines may be blocked unnecessarily) and I think you have nothing except a more complicated API to work with threads. I think I am ready to offer a counterproposal where the event loop runs in one thread and synchronous code runs in another thread and we give the synchronous code a way to synchronously await a coroutine or an asyncio.Future. This can be based on asyncio.run_coroutine_threadsafe(), which takes a coroutine or an asyncio.Future and returns a concurrent Future. (It also takes a loop, and it assumes that loop runs in a different thread. I think it should assert that.) The main feature of my counterproposal as I see it is that async code should not call back into synchronous code, IOW once you are writing coroutines, you have to use the coroutine API for everything you do. And if something doesn't have a coroutine API, you run it in a background thread using loop.run_in_executor(). So either you buy into the async way of living and it's coroutines all the way down from there, no looking back -- or you stay on the safe side of the fence, and you interact with coroutines only using a very limited "remote manipulator" API. The two don't mix any better than that. -- --Guido van Rossum (python.org/~guido)

On 5 October 2016 at 02:15, Guido van Rossum <guido@python.org> wrote:
Yeah, that's the main reason I haven't gone beyond this as a toy idea - there are so many ways to get yourself in trouble if you don't already understand the internal details.
Oh, that makes a lot more sense, as we'd end up with a situation where async code gets used in one of two ways: - asynchronous main thread (the typical way it gets used now) - synchronous thread with a linked asynchronous helper thread The key differences between the latter and a traditional thread pool is that there'd only be the *one* helper thread for any given synchronous thread, and as long as the parent thread keeps its hands off any shared data structures while coroutines are running, you can still rely on async/await to interleave access to data structures shared by the coroutines.
+1 I considered suggesting that the "remote manipulator" API could be spelled "await expr", but after starting to write that idea up, realised it was likely a recipe for hard-to-debug problems when folks forget to add the "async" declaration to a coroutine definition. So that would instead suggest 2 module level functions in asyncio: * call_in_background(coroutine_or_callable, *args, **kwds): - creates the helper thread if it doesn't already exist, stores a reference in a thread local variable - schedules coroutines directly in the helper thread's event loop - schedules other callables in the helper thread's executor - returns an asyncio.Future instance - perhaps lets the EventLoopPolicy override this default behaviour? * wait_for_result: - blocking call that waits for asyncio.Future.result() to be ready Using "call_in_background" from a coroutine would be OK, but somewhat redundant (as if a coroutine is already running, you could just use the current thread's event loop instead). Using "wait_for_result" from a coroutine would be inappropriate, as with any other blocking call. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I am a little out on deep water here, but I think that if an object instance was guaranteed - by Python runtime - to run in one coroutine/thread and only the message passing of method call and return values was allowed to pass between coroutine/thread context, then at least all local instance variable reference would be fine?
Maybe not, but I am hoping for something better :-)

On 04.10.2016 13:30, Nick Coghlan wrote:
Maybe, this is all a big misunderstanding. asyncio is incompatible with regular execution flow and it's **always blocking**. However, asyncio is perceived by some of us (including me) as a shiny alternative to processes and threads but really isn't. I remember doing this survey on python-ideas (results here: https://srkunze.blogspot.de/2016/02/concurrency-in-python.html) but I get the feeling that we still miss something. My impression is that asyncio shall be used for something completely different than dropping off things into a background worker. But looking at the cooking example given by Steve Dower (cf. blog post), at other explanations, at examples in the PEPs, it just seems to me that his analogy could have been made with threads and processes as well. At its core (the ASYNC part), asyncio is quite similar to threads and processes. But its IO-part seem to drive some (design) decisions that don't go well with the existing mental model of many developers. Even PEP-reviewers are fooled by simple asyncio examples. Why? Because they forget to spawn an eventloop. "async def and await" are just useless without an eventloop. And maybe that's what's people frustration is about. They want the ASYNC part without worrying about the IO part. Furthermore, adding 2 (TWO) new keywords to a language has such an immense impact. Especially when those people are told "the barrier for new keywords is quite high!!". So, these new keywords must mean something. I think what would help here are concrete answers to: 0) Is asyncio a niche feature only be used for better IO? 1) What is the right way of integrating asyncio into existing code? 2) How do we intend to solve the DRY-principle issue? If the answer is "don't use asyncio", that's a fine result but honestly I think it would be just insane to assume that we got all these features, all this work and all those duplicated functions all for nothing. I can't believe that. So, I am still looking for a reasonable use-case of asyncio in our environment. Cheers, Sven

On 04.10.2016 09:50, Stephen J. Turnbull wrote:
I don't think that's actually what I wanted here. One simple keyword should have sufficed just like golang did. So, the developer gets a way to decide whether or not he needs it blocking or nonblocking **when using a function**. He doesn't need to decide it **when writing the function**. You might wonder why this is relevant. DRY principle has been mentioned but there's more to it. Only the caller **can decide** whether it needs to wait or not. Why? Because, the caller works WITH the result of the called function (whatever results means to you). The caller is (what Nick probably would call) the orchestrator, as it has the knowledge about the relation and interaction between domain-specific function calls. As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what René and Anthony objects about. Cheers, Sven

I agree, that’s why i proposed to put the async keyword in when creating the object, saying in this instance I want asynchronous communication with the object.
You might wonder why this is relevant. DRY principle has been mentioned but there's more to it. Only the caller **can decide** whether it needs to wait or not. Why? Because, the caller works WITH the result of the called function (whatever results means to you). The caller is (what Nick probably would call) the orchestrator, as it has the knowledge about the relation and interaction between domain-specific function calls.
+1
As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what René and Anthony objects about.
I had a look at xfork, and really like it. It is implemented much like the lower level of PYWORKS and PYWORKS could build on xfork instead. I think that the “model” of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) . I the ideal case it should be up to the runtime implementation (CPython, PyPy, Jython, IronPython etc.) how the asynchronous behaviour is implemented (greenlets, threads, roll-it-own, etc) br /Rene

Rene Nejsum writes:
On 04 Oct 2016, at 18:40, Sven R. Kunze <srkunze@mail.de> wrote:
I don't believe it's true, but suppose it is. *You don't need syntactic support* (a keyword) for it. Do you? It can all be done conveniently and readably with functions, as you have proved yourself with pyworks and Sven has with xfork, not to forget greenlets and gevent. No? You could argue that coroutines don't require syntax (keywords) either, but some Very Smart People disagree. I don't understand PEP 492's implementation well, but pretty clearly there are blockers to allowing ordinary __next__ methods doing async calls. There's also the issue mentioned in PEP 3153 that generators don't fit the notion of (self-actuated) producers "pushing" values into other code; they're really about having values pulled out of them. So PEPs 3156 and 492 are actual extensions to Python's capabilities for compact, readable expression of [a specific idiom/model of] asynchronous execution. They aren't intended for all possible models, just to help with one that is important to a fairly large class of Python programmers.
I think that the model of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) .
Why be restrictive? Python already supports many models of concurrency, pretty much filling the space (parallel execution vs. coroutines, shared-state vs. isolated, cooperative vs. preemptive, perhaps there are other dimensions). Why go backward from where we already are?

On 5 October 2016 at 16:49, Rene Nejsum <rene@stranden.com> wrote:
OK, I think there may be a piece of foundational knowledge regarding runtime design that's contributing to the confusion here. Python's core runtime model is the C runtime model: threads (with a local stack and access to a global process heap) and processes (which contain a heap and one or more threads). Anything else we do (whether it's generators, coroutines, or some other form of paused execution like callback management) gets layered on top of that runtime model. When folks ask questions like "Why can't Python be more like Go?", "Why can't Python be more like Erlang?", or "Why can't Python be more like Rust?" and get a negative response, it's usually because there's an inherent conflict between the C runtime model and whatever piece of the Go/Erlang/Rust runtime model we want to steal. So the "async" keyword in "async def", "async for" and "async with" is essentially a marker saying "This is not a C-like runtime concept anymore!" (The closest C-ish equivalent I'm aware of would be Apple's Grand Central Dispatch in Objective-C and that shows many of the async/await characteristics also seen in Python and C#: https://www.raywenderlich.com/60749/grand-central-dispatch-in-depth-part-1 ) Go (as with Erlang before it) avoided these problems by not providing C-equivalent functions in the first place. Accordingly, *every* normal function defined in Go can also be used as a goroutine, rather than needing to be a distinct type - their special case is defining functions that interoperate with external C libraries. Python (along with other languages built on the C runtime model like C# and Objective-C) doesn't have that luxury - we need to distinguish coroutines from regular functions, since we can't just handle them according to the underlying C runtime model any more. Guido's idea of a shadow thread to let synchronous threads run coroutines without needing to actually run a foreground event loop should provide a manageable way of getting the two runtime models (traditional C and asynchronous coroutines) to play nicely together in a single application, and has the virtue of being something folks can readily experiment with for themselves before we commit to anything specific in the standard library (since all the building blocks of thread local storage, event loop management, and inter-thread message passing primitives are already available). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 05.10.2016 18:06, Nick Coghlan wrote:
[runtime matters]
I think I understand your point. I also hope that others and me could provide you with our perspective. We see Python not as a C-like runtime but as an abstract modelling language. I know that it's different from the point of view of CPython internals, however from the outside Python suggests to be much more than a simple wrapper around C. Just two different perspectives. Unfortunately, your runtime explanations still don't address the DRY issue. :-/
I needed to think about this further when Guido mentioned it. But I like it now. If you check https://github.com/srkunze/fork/tree/asyncio , I already started working on integrating asyncio into xfork at long time ago. But I still couldn't wrap my mind around it and it stalled. But IIRC, I would have implemented a shadow thread solution as well. So, if his idea goes into the stdlib first, I welcome it even more as it would do the heavy lifting for me. xfork would then be just a common interface to threads, processes and coroutines. Cheers, Sven

On 10/05/2016 12:20 PM, Sven R. Kunze wrote:
On 05.10.2016 18:06, Nick Coghlan wrote:
At this point I'm willing to bet that you (Sven) are closest to actually having a shadow thread thingy that actually works. Maybe some other asyncio folks would be willing to help you develop it? -- ~Ethan~

Excellent point. For me CPython, Jython, IronPython, PyPy are the same (99.9%) and the important part is the Python the language. For a long time I tested PYWORKS again all implementations and were happy that it ran on all. Clearly, for others CPython (incl. runtime and C-bindings) is the fact and the others are far from the same, especially because the missing C-integration. But, are the runtimes for Python and Erlang that fundamentally different? Is it Python’s tight integration with C that is the big difference? When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto.
xfork (as pyworks) implements a proxy object, which “almost” behaves like the real object, but it is still a proxy. If fork (or spawn, chan, async, whatever.) was a part of the language it would be more clean. br /Rene
Cheers, Sven

On 5 October 2016 at 21:28, Rene Nejsum <rene@stranden.com> wrote:
But, are the runtimes for Python and Erlang that fundamentally different? Is it Python’s tight integration with C that is the big difference?
I don't know *that* much about Erlang, but Python's model is that of a single shared address space with (potentially multiple) threads of code running, having access to that address space. Erlang's model is that of multiple threads of execution (processes) that are isolated from each other (they have independent address spaces). That's a pretty fundamental difference, and gets right to the heart of why async is fundamentally different in the two languages. It also shows in Erlang's C FFI, which as I understand it is to have the C code isolated in a separate "process", and the user's program communicating with it through channels. As far as I can see, that's a direct consequence of the fact that you couldn't safely expect to call a C function (with its direct access to the whole address space) direct from an Erlang process. Python's model is very similar to C (and Java, and C#/.net, and many other "traditional" languages [1]). That's not "to make it easier to call C functions", it's just because it was a familiar and obvious model to use, known to work well, when Python was first developed. The fact that it made calling C from Python easy was a side effect - one that helped make Python as useful and popular as it is today, but nevertheless simply a side effect of the model. Paul [1] And actual computer hardware, which isn't a coincidence :-)

Paul Moore wrote:
I don't know much about Erlang either, but from what I gather, it's a functional language. That removes a lot of potential problems with concurrency right from the beginning. You can't have trouble with mutation of shared state if you can't mutate state in the first place. :-) -- Greg

On Wed, Oct 5, 2016 at 1:28 PM, Rene Nejsum <rene@stranden.com> wrote:
When I first read about the async idea, I initially expected that it would be some stackless like additions to Python. My wish for Python was an addition to the language the allowed an easy an elegant concurrent model on the language level. Ideally a Python program with 1000 async objects parsing a 10TB XML in-memory file, should run twice as fast on a 8-core CPU, compared to a 4-core ditto.
I think there's two fundamentally different layers getting conflated here, which is really confusing the issue. Layer 1 is the user API for concurrency. At this layer, there are two major options in current Python. The first option is the "implicit interleaving" model provided by classic threads, stackless, gevent, goroutines, etc., where as a user you write regular "serial" code + some calls to thread spawning primitives, and then the runtime magically arranges for multiple pieces of "serial" code to run in some kind of concurrent/parallel fashion. One downside of this approach is that because the runtime gets to arbitrarily decide how to interleave the execution of these different pieces of code, it can be difficult for the user to reason about interactions between them. So this motivated the second option for user APIs: the "explicit interleaving" model where as a user you annotate your code with some sort of marker saying where it's willing to be suspended (Python uses the "await" keyword), and then the runtime is restricted to only running one piece of code at a time, and only switching between them at these explicitly marked points. (The canonical reference on this is https://glyph.twistedmatrix.com/2014/02/unyielding.html) (I like to think about this as opt-out concurrency vs opt-in concurrency: the first model is concurrent by default except where you explicitly use a mutex; the second is serial by default except where you explicitly use "await".) So that's the user API level. Then there's Layer 2, the strategies that the runtime underneath uses to implement whichever semantics are in play. There are a lot of options here -- in particular, within the "implicit interleaving" model Python has existing production-ready implementations using OS level threads with a GIL (CPython's threading module), clever C stack manipulation tricks on a single OS level thread (gevent), OS level threads without a GIL (Jython's threading module), etc., etc. Picking between these is an implementation trade-off, not a language-level semantics trade-off -- from the point of view of the user API, they're pretty much interchangeable. ...And in principle you could also use any of these options to implement the "explicit interleaving" approach. For example, each coroutine could get assigned its own OS level thread, and then to get the 'await' semantics you could have a shared global lock that gets dropped when entering an 'await' and then re-acquired afterwards. This would be silly and inefficient compared to what asyncio actually does (it uses a single thread, like gevent), so no-one would do this. But my point is that at the user API level, again, these are just implementation details -- this would be a valid way to implement the async/await semantics. So what can we conclude from all this? First, if your goal is to write code that gets faster when you add more CPU cores, then that means you're looking for a particular implementation strategy: you want OS level threads, and no GIL. One way to do this would be to keep the Python language semantics the same, while modifying CPython's implementation to remove the GIL. This turns out to be really hard :-). But Jython demonstrates that the existing APIs are sufficient to make it possible -- the difficulties are in the CPython implementation, not in the language, so that's where it would need to be fixed. If someone wants to push this forward probably the thing to do is to see how Larry's "gilectomy" project is doing and help it along. Another strategy would be to come up with some new user API that can be added to the language, and whose semantics are more amenable to no-GIL-multithreading. There are lots of somewhat nascent ideas out there -- IIRC Eric's been thinking about using subinterpreters to add shared-nothing threads (versus the shared-everything threads which Python currently supports -- shared nothing is what Erlang does), there's Armin's experiments with STM in PyPy, there's PyParallel, etc. Nick has a good summary: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python... But -- and this is the main point I've been leading up to -- async/await is *not* the new user-level API that you're looking for. Async/await were created to enable the "explicitly interleaved" style of programming, which as we saw above effectively takes the GIL and promotes it to becoming an explicit part of the user API, instead of an implementation detail of the runtime. This is the one and only reason async/await exist -- if you don't want to explicitly control where your code can switch "threads" and be guaranteed that no other code is running at the same time, then there is no reason to use async/await. So I think the objection to async/await on the grounds that they clutter up the code is based on a misunderstanding of what they're for. It wasn't that we created these keywords to solve some implementation problem and then inflicted them on users. It's exactly the other way around. *If* you as a user want to add some explicit annotations to your code to control how parallel execution can be interleaved, *then* there has to be some keywords to write those annotations, and that's what async/await are. And OTOH if you *don't* want to have markers in your code to explicitly control interleaving -- if you prefer the "implicit interleaving" style -- then async/await are irrelevant and you shouldn't use them, you should use threading/gevent/whatever. -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel Smith wrote:
It wasn't that we created these keywords to solve some implementation problem and then inflicted them on users.
I disagree -- looking at the history of how we ended up with async/await, it looks to me like this is exactly what *did* happen. First we had generators. Then 'yield from' was invented to (among other things) leverage them as a way of getting lightweight threads. Then 'await' was introduced as a nicer way to spell 'yield from' when using it for that purpose. Saying that 'await' is good for you because it makes the suspension points visible seems to me a rationalisation after the fact. It was something that emerged from the implementation, not a prior design requirement. -- Greg

On Thu, Oct 6, 2016 at 12:45 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I wasn't trying to write a detailed account of the development, as much as try to capture some essential features. Myth, not history :-). In the final design, the one and only thing that distinguishes async/await from gevent is that in the former the suspension points are visible, and in the latter they aren't. I don't really believe that it's an accident that people put a lot of effort into creating async/await in this way at a time when gevent already existed and was widely used in production, and we have historical documents like Glyph's blog arguing for visible yield points as a motivation for async/await, but... even if you think it *was* an accident, it hardly matters at this point. The core distinguishing feature between async/await and gevent is the visibility of suspension points, so it might as well be the case that async/await is designed for exactly those people who want visible suspension points. (And I didn't say await or visible suspension points are necessarily "good for you" -- obviously the implicit and explicit interleaving approaches have trade-offs you'll have to judge for yourself. But there are some people in some situations who want implicit interleaving and async/await is there for them.) -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel Smith wrote:
They're not quite independent axes, though. Gevent is based on greenlet, which relies on some slightly dubious tricks at the C level and doesn't play well with some external libraries. As far as I know, there's no current alternative that's just as efficient and portable as asyncio but without the extra keywords. If you want the full benefits of asyncio, you're forced to accept explicit suspension points. -- Greg

On Thu, Oct 6, 2016 at 4:12 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd be interested to hear more about this. gevent/greenlet don't seem to have an official "list of supported platforms" that I can find, but I can't find concrete examples of unsupported platforms either. Are we talking like, HPUX-on-MIPS or...? And obviously there are always going to be some cases that are better supported by either one tool or another, but as we've seen getting external libraries to play well with asyncio is also pretty non-trivial (exactly because of those explicit suspension points!), and my impression was that for now gevent actually had a larger ecosystem. For folks who prefer the gevent API, is it really easier to port libraries to asyncio than to port them to gevent? -n -- Nathaniel J. Smith -- https://vorpus.org

On 7 October 2016 at 16:42, Nathaniel Smith <njs@pobox.com> wrote:
It's definitely *not* easier, as gevent lets you suspend execution inside arbitrary CPython magic method calls. That's why you can still use SQL Alchemy's ORM layer with gevent - greenlet can swap the stack even with the extra C call frames on there. If you're running in vanilla CPython (or recent non-Windows versions of PyPy2), on a relatively mainstream architecture like x86_64 or ARM, then gevent/greenlet will be fine as an applications synchronous/asyncrhonous bridge. However, if you're running in a context that embeds CPython inside a larger application (e.g. mod_wsgi inside Apache), then gevent's assumptions about how the C thread states are managed may be wrong, and hence you may be in for some "interesting" debugging sessions. The same goes for any library that implements callbacks that end up executing a greenlet switch when they weren't expecting it (e.g. while holding a threading lock - that will protect you from other OS threads, but not from other greenlets in the same thread) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I can speak to this. It’s been my professional experience with gevent that choosing to obtain concurrency by using gevent as opposed to explicit async was a trade-off: we replaced a large amount of drudge work in writing a codebase with async/await pervasively throughout it with a smaller amount of dramatically (10x to 100x times) more intellectually challenging debugging work when unstated assumptions regarding thread-safety and concurrent access were violated. For many developers these trade offs are sensible and reasonable, but we should all remember that there are costs and advantages of most kinds of runtime model. I’m happy to have a language that lets me do all of these things than one that chooses one for me and says “that ought to be good enough for everyone”. Cory

On 6 October 2016 at 17:45, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd say it emerged from most folks still not grasping generators-as-coroutines a decade after PEP 342, and asynchronous IO in general ~15 years after Twisted was first released. When a language usage pattern is supported for that long, but folks still don't grok how it might benefit them, you have a UX problem, and one of the ways to address it is to take the existing pattern and give it dedicated syntax, which is exactly what PEP 492 did. Dedicated syntax at least dramatically lowers the barrier to *recognition* of the coroutine design pattern when it's being used, and can help with explaining it as well (since the overlap with other concepts in the language becomes a hidden implementation detail rather than being an essential part of the user experience). The shadow thread idea will hopefully prove successful in addressing the last major rough spot in the UX, which is the ability to easily integrate asynchronous components into an otherwise synchronous application. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2016-10-06 13:50 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
That's my opinion as well. If I had to run asyncio coroutines from synchronous code, I'd probably take advantage of the Executor interface defined by concurrent.futures. Executors handle resource management through a context manager interface, which is a good way to start and clean after the shadow thread. Also, the submit method returns a concurrent.futures.Future, i.e. the standard for accessing an asynchronous result from synchronous code. Here's a simple implementation: https://gist.github.com/vxgmichel/d16e66d1107a369877f6ef7e646ac2e5 If this is not enough, (say one wants to write a synchronous API to an asynchronous library), then it simply is a matter of instantiating the executor once in the module and wrap all the coroutines to expose with executor.submit and Future.result. This might provide an acceptable answer to the DRY thing that has been mentioned a few times, though I'm not convinced it is such a problematic issue (at least nothing that sans-io already addresses in the first place).

Nick Coghlan wrote:
However, it was just replacing one way of explicitly marking suspension points ("yield from") with another ("await"). The fact that suspension points are explicitly marked was driven by the implementation from the beginning. When I first proposed "yield from" as an aid to using generators as coroutines, my intention was always to eventually replace it with something else. PEP 3152 was my proposal for what the something else might be. I initially regarded it as a wart that it still required a special syntax for suspendable calls, and felt the need to apologise for that. I was totally surprised when people said they actually *liked* the idea of explicit suspension points. -- Greg

On 6 October 2016 at 05:20, Sven R. Kunze <srkunze@mail.de> wrote:
It's not a question that's up for debate - as a point of factual history, Python's runtime model is anchored in the C runtime model, and this pervades the entire language design. Simply wishing that Python's core runtime design was other than it is doesn't make it so. We can diverge from that base model when we decide there's sufficient benefit in doing so (e.g. the object model, the import system, the numeric tower, exception handling, lexical closures, generators, generators-as-coroutines, context management, native coroutines), but whatever we decide to do still needs to be expressible in terms of underlying operating system provided C primitives, or CPython can't implement it (and if CPython can't implement a feature as the reference implementation, that feature can't become part of the language definition). Postponing the point at which folks are confronted by those underlying C-level constraints is often an admirable goal, though - the only thing that isn't possible without fundamentally changing the language is getting rid of them entirely. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2016-10-06 03:27, Nick Coghlan wrote:
That may be true, but the limitation there is Python's core runtime model, not C's. As you say, Python's runtime model is historically anchored in C, but that doesn't mean C's runtime model itself directly constrains Python's. As others have mentioned, there are plenty of other languages that are themselves written in C but have different runtime models. The constraint is not compatibility with the C runtime model, but backward compatibility with Python's own earlier decisions about its own runtime model. This may sound like an academic point, but I just want to mention it because, as you say later, hiding C from the Python programmer is often an admirable goal. I would go so far as to say it is almost always an admirable goal. The Python runtime isn't going to suddenly change, but we can make smart decisions about incremental changes in a way that, over time, allows it to drift further from the C model, rather than adding more and more tethers linking it more tightly to the C model.
Sure. But over the long term, almost anything is possible. As I said above, my own opinion is that hiding C from Python users is almost always a good thing. I (and I think many other people) use Python because I like Python. If I liked C I would use C. To the extent that Python allows C to constrain it (or, more specifically, allows the nature of C to constrain people who are only writing Python code), it limits its ability to evolve in a way that frees users from the things they don't like about C. This is kind of tangential to the current issue about async. To be honest I am quite ignorant of how async/await will help or hurt me as a Python user. As you say, certain constraints are unavoidable. (We don't have to use C's runtime model, but we do have to be able to write our runtime model in C.) But I think it's good, when thinking about these features, to think how they will constrain future language development versus opening it up. If, for instance, people start using async/await and old-school generator-send-style coroutines become unused, it will be easier to deprecate generator-send in the distant future. On the flip side, I would hate to see decisions made that result in lots of Python code that "bakes in" specific runtime model assumptions, making it more difficult to leave those assumptions behind in the future. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Nick Coghlan writes:
How can there be a conflict between Python implementing the C runtime model *itself* which says "you can do anything anywhere anytime", and some part of Python implementing the more restricted models that allow safe concurrency? If you can do anything, well, you can voluntarily submit to compiler discipline to a restricted set. No? So it must be that the existing constructions (functions, for, with) that need an "async" marker have an implementation that is itself unsafe. This need is not being explained very well. What is also not being explained is what would be lost by simply using the "safe" implementations generated by the async versions everywhere. These may be hard to explain, and I know you, Yury, and Guido are very busy. But it's frustrating for all to see this go around in a circle: "it's like it is because it has to be that way, so that's the way it is". There's also the question of "is async/await really a language feature, or is it patching up a deficiency in the CPython implementation that other implementations don't necessarily have?" (which has been brought up before, in less contentious terms).
That's understood, of course. The question that isn't being answered well is "why can't that non-C-like runtime concept be like Go or Erlang or Rust?" Or, less obtusely, "what exactly is the 'async' runtime concept, and why is it preferred to the concepts implemented by Go or Erlang or Rust or gevent or greenlets or Stackless?" I guess the answer to "why not Stackless?" is buried in the archives for Python-Dev somewhere, but I need to get back to $DAYJOB, maybe I'll look it up later.

Agree, well put. The Erlang runtime (VM) is also written in C, so anything should be possible. I do not advocate that Python should be a “new” Erlang or Go, just saying that since we are introducing some level of concurrency in Python that we look at some of the elegant ways others have achieved this and try to implement something like that in Python.
I understand that there is a lot of backwards compatibility, especially in regards to the Python/C interface, but I think that it is possible to find an elegant solution to this.
This would be very interesting to understand.
I will try to look for that, I have some time on my hands, not sure I have have the %BRAINSKILL, but never the less… br /Rene

On 6 October 2016 at 15:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Anything is possible in C, but not everything is readily supportable :) When you design a new language and runtime from scratch, you get to set new rules and expectations if you want to do that. Ericsson did it with Erlang and BEAM (the reference Erlang VM) by declaring "Everything's an Actor in the 'Actor Model' sense, and Actors can send messages to each other's mailboxes". That pushes you heavily towards application designs where each "process" is a Finite State Machine with state changes triggered by external events, or by messages from other processes. If BEAM had been published as open source a decade earlier than it eventually was, I suspect the modern computing landscape would look quite different from the way it does today. Google did something similar with Golang and goroutines by declaring that Communicating Sequential Processes would be their core concurrency primitive rather than C's shared memory threading. By contrast, Python, C++, Java, C#, Objective-C all retained C's core thread-based "private stack, shared heap" concurrency model, which later expanded to also include thread local heap storage. Rust actually retains this core "private stack, private heap, shared heap" model, but changes the management of data ownership to avoid the messy problems that arise in practice when using the "everything is accessible to every thread by default" model.
Correct (for a given definition of unsafe): in normal operation, CPython uses the *C stack* to manage the Python frame stack, so when you descend into a new function call in CPython, you're also using up more C level stack space. This means that when CPython throws RecursionError, what it's actually aiming to prevent is a C level segfault arising from running out of stack space to manage frames: $ ./python -X faulthandler Python 3.6.0b1+ (3.6:b995b1f52975, Sep 22 2016, 01:19:04) [GCC 6.1.1 20160621 (Red Hat 6.1.1-3)] on linux Type "help", "copyright", "credits" or "license" for more information.
Current thread 0x00007fe977a7c700 (most recent call first): File "<stdin>", line 1 in f File "<stdin>", line 1 in f File "<stdin>", line 1 in f [<manual snip>] ... Segmentation fault (core dumped) Loops, with statements and other magic method invocations all work that way - they make a C level call to the magic method implementation which may end up running a new invocation of the eval loop to evaluate the bytecode of a magic method implementation that's written in Python. The pay-off that CPython gets from this is that we get to delegate 99.9% of the work for supporting different CPU architectures to C compiler developers, and we get a lot of capabilities "for free" when it comes to stack management. The downside is that C runtimes don't officially support swapping out the stack of the current thread with new contents. It's *possible* to do that (hence Stackless and gevent), but you're on your own when it comes to debugging it when it breaks. That makes it a good candidate for an opt-in "expert users only" capability - folks that decide gevent is the right answer for their needs can adopt it if they want to (perhaps restricting their choice of target platform and C extension modules as a result), while we (as in the CPython core devs) don't need to keep custom stack manipulation code working on all the platforms where CPython is supported and with all the custom C extension modules that are out there.
The two main problems with that idea are speed and extension module compatibility. The speed aspect is simply that we have more than 4 decades behind us of CPU designers and compiler developers making C code run fast. CPython uses that raw underlying speed to offer a lot of runtime flexibility with a relatively simple implementation while still being "fast enough" for many use cases. Even then, function calls are still notoriously slow, and await invocations tend to be slower still. The extension module compatibility problem is simply that whereas you can emulate a normal Python function just by writing a normal C function, emulating a Python coroutine involves implementing the coroutine protocol. That's possible, but it's a lot more complicated, and even if you implemented a standard wrapper, you'd be straight back to the speed problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
One of the main benefits is that it's very easy for external code to make callbacks to Python code. The original implementation of Stackless decoupled the eval stack from the C stack, but at the expense of making the API for calling external C code much less straightforward. -- Greg

On 2016-10-06 1:15 AM, Stephen J. Turnbull wrote:
To add to what Nick said. I myself would want to use a time machine to help design CPython runtime to allow Golang style jof concurrency (although Golang has its own bag of problems). Unfortunately there is no time machine, and implementing that in CPython today would be an impossibly hard and long task. To start, no matter how exactly you want to approach this, it would require us to do a *complete rewrite* of CPython internals. This is so complex that we wouldn't be able to even estimate how long it would take us. This would be a far more significant change than Python 2->3. BTW in the process of doing that, we would have to completely redesign the C API, which would effectively kill the entire numpy/scipy ecosystem. If someone disagrees with this, I invite them to go ahead and write a PEP (please!) On the other hand, async/await and non-blocking IO make it possible to write highly concurrent network applications. Even languages with good support of threading, such as C#, have async/await [sic!]. Even Rust users want them, and will likely add them in the language or std lib. Even C++ might have coroutines soon. Why? Because Rust and C# can't "just" implement actors model. Because threads are hard and deadlocks and code that is hard to reason about. Because threads can't scale as good as non-blocking IO. We probably could implement actors if we decided to merge Stackless or use greenlets in the core. Anyone who looked at/debugged the implementation of greenlets would say it's a bad idea. And gevent is available for those who want to use them anyways. In the end, async/await is the only *practical* solution for a language like Python. Yes, it's a bit harder to design libraries that support both synchronous and asynchronous APIs, but there's a way: separate your protocol parsing from IO. When done properly, it's easier to write unittests and it's a no-brainer to add support for different IO models. Yury

Regarding the Python C-runtime and async, I just had a good talk with Kresten Krab at Trifork. He implemented “Erjang” the Java implementation of the Erlang VM (www.erjang.org <http://www.erjang.org/>). Doing this he had access to the Erlang (C) VM. It turn’s out that the Erlang VM and the Python VM has a lot of similarities and the differences are more in the language, than in the VM Differences between the Erlang VM and Python related to async are: 1) Most variables in Erlang are immutable Making it easier to have coroutines 2) coroutines are built into the Erlang using the “spawn” keyword Leaving the specific implementation to the VM, but never implemented with OS threads. 3) All coroutines have their own heap and stack (initially 200 bytes), but can grow as needed 4) coroutines are managed in “ready-queue”, from which the VM thread executes the next ready job Each job gets 2000 “instructions” (or until IO block) and the next coroutine is executed Because of this, when multicore CPU’s entered the game, it was quite easy to change the Erlang VM to add a thread per core to pull from the ready-queue. This makes an Erlang program run twice as fast (almost) every time the number of cores are doubled! Given this, I am still convinced that: obj = async SomeObject() should be feasible, even though there will be some “golang” like issues about shared data, but there could be several ways to handle this. br /Rene

The problem is that if your goal is to make a practical proposal, it's not enough to look at Python-the-language. You're absolutely right, AFAICT there's nothing stopping someone from making a nice implementation of Python-the-language that has erlang-style cheap shared-nothing threads with some efficient message-passing mechanism. But! It turns out that unless your new implementation supports the CPython C API, then it's almost certainly not viable as a mainstream CPython alternative, because there's this huge huge pile of libraries that have been written against that C API. You're not competing against CPython, you're competing against CPython+thousands of libraries that you don't have and that your users expect. And unfortunately, it turns out that the C API locks in a bunch of the implementation assumptions (refcounting, the GIL, use of the C stack, poor support for isolation between different interpreter states, ...) that you were trying to get away from. I mean, in many ways it's a good problem to have, that our current ecosystem is just so attractive that it's hard to compete with! (Though a pessimist could point out that this difficulty with competing with yourself is exactly what tends to eventually undermine incumbents -- cf. the innovator's dilemma.) And it's "just" a matter of implementation, not Python-the-language itself. But the bottom line is: this is *the* core problem that you have to grapple with if you want to make any radical improvements in the Python runtime and have people actually use them. -n On Mon, Oct 17, 2016 at 9:36 AM, Rene Nejsum <rene@stranden.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

Your are right about the importance of Python C API, it often goes under my radar. For the past 20 years I have only used it a couple of times (to integrate Python into some existing C-code) therefore it is not as much in focus as it should be and definiatly are by others. I get your innovators dilemma all to well, just look at Python 3 and the time it took us to shift from 2. But, watching Larry Hastings talk on his awesome gilectomy project, it was my understanding that he at least saw it as a possibility to do a backward compatible extension of the C-API for his GIL removal project. As I understand he proposes that the Python runtime should check whether a given C-lib has been upgraded to support non-GIL, if not run it as an old version. I am not sure how much it will take in this case, but i thought “hey, if Larry Hastings is removing the GIL and proposing an extension to the C-api, at least it can be done” :-) /Rene

On 05.10.2016 08:49, Rene Nejsum wrote:
As a result of past discussions, I wrote the module "xfork" which basically does this "golang goroutine" stuff. It's just a thin wrapper around "futures" but it allows to avoid that what René and Anthony objects about. I had a look at xfork, and really like it. It is implemented much like the lower level of PYWORKS and PYWORKS could build on xfork instead.
Thanks. :)
I think that the “model” of doing async should be defined in the Python language/runtime (like in Go, Erlang, ABCL) . I the ideal case it should be up to the runtime implementation (CPython, PyPy, Jython, IronPython etc.) how the asynchronous behaviour is implemented (greenlets, threads, roll-it-own, etc)
That's the way I see it as well. The Python language is extremely high-level. So, I guess in most cases, most people would just use the default implementation. Cheers, Sven

I agree 100%. Ideally I think a language (would love it to be Python) should permit many (millions) of what we know as coroutines and then have as many threads as the CPU have cores to execute this coroutines, but I do not thing you as a programmer should be especially aware of this as you code. (Just like GC handles your alloc/free, the runtime should handle your “concurrency”)
People want to know how they are suppose to write unified, non-insane-and-ugly code in this a/sync python 2/3 world we now find ourselves in. I've been eagerly watching this thread for the answer, thus far to no avail.
Agree
Sans-io suggests we write bite-sized synchronous code that can be driven by a/sync consumers. While this is all well and good, how does one write said consuming library for both I/O styles without duplication?
The answer seems to be "write everything you ever wanted as async and throw some sync wrappers around it". Which means all the actual code I write will be peppered with async and await keywords.
Have a look at the examples in David Beazley’s curio, he is one of the most knowable Python people I have met, but that code is almost impossible to read and understand.
In Go I can spawn a new control state (goroutine) at any time against any function. This is clear in the code. In Erlang I can spawn a new control state (Erlang process) at any time and it's also clear. Erlang is a little different because it will preempt me, but the point is I am simply choosing a target function to run in a new context. Gevent and even threading module is another example of this pattern.
Having thought some more about it, I think that putting async i front of the object, could be kind of a channel i Go and other languages?
In all reality you don't typically need many suspension points other than around I/O, and occasionally heavy CPU, so I think folks are struggling to understand (I admit, myself included) why the runtime doesn't want to be more help and instead punts back to the developer.
Well put, we are definitely on the same page here, thank you. br /Rene
--
C Anthony

On Mon, Oct 3, 2016 at 10:37 PM, Rene Nejsum <rene@stranden.com> wrote:
There's a problem with this model (of using all CPUs to run coroutines), since when you have two coroutines that can run in unspecified order but update the same datastructure, the current coroutine model *promises* that they will not run in parallel -- they may only alternate running if they use `await`. This promise implies that you can update the datastructure without worrying about locking as long as you don't use `await` in the middle. (IOW it's non-pre-emptive scheduling.) If you were to change the model to allow multiple coroutines being executed in parallel on multiple CPUs, such coroutines would have to use locks locks, and then you have all the problems of threading back in your coroutines! (There might be other things too, but there's no wait to avoid a fundamental change in the concurrency model.) Basically you're asking for Go's concurrency model -- it's nice in some ways, but asyncio wasn't made to do that, and I'm not planning to change it (let's wait for a GIL-free Python 4 first). I'm still trying to figure out my position on the other points of discussion here -- keep discussing! -- --Guido van Rossum (python.org/~guido)

Well, yes and no. I other languages (Java/C#) where I have implemented concurrent objects ala PYWORKS it works pretty well, as long as you have less than maybe 10.000 threads But, in Python (CPython2 on multicore CPU) threads does not work! The GIL makes it impossible to have for example 100 threads sending messages between each other (See the Ring example in PYWORKS), that’s one reason why it would be interesting to have some kind of concurrency support built into the Python runtime. Today I see all kinds of tricks and workarounds to get around the GIL. Raging from starting several Python interpreters to difficult to read code using yield (now async/await), but when you have seen much more elegant support (Go, Erlang, maybe even ABCL) you kind of wish this could be added to you own favourite language. br /Rene
participants (19)
-
Brendan Barnwell
-
C Anthony Risinger
-
Chris Angelico
-
Cory Benfield
-
Ethan Furman
-
Giampaolo Rodola'
-
Greg Ewing
-
Guido van Rossum
-
Michel Desmoulin
-
MRAB
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Rene Nejsum
-
Stephen J. Turnbull
-
Sven R. Kunze
-
Vincent Michel
-
Yann Kaiser
-
Yury Selivanov