So, I've been playing a bit with the information I saw in this thread (thank you all for the responses) and I got something super simple working: https://gist.github.com/argaen/056a43b083a29f76ac6e2fa97b3e08d1

What I like about this (and that's what I was aiming for) is that the user uses the same class/interface no matter if its inside asyncio world or not. So both `await fn()` and `fn()` work producing the expected results.

Now some cons (that in the case of my library are acceptable):

- This aims only for asyncio compatibility, other async frameworks like trio, curio, etc. wouldn't work
- No python2 compatibility (although Nathaniel's idea of bleaching could still be applied)
- I guess it adds some overhead to both sync and async versions, I will do some benchmarking when I have time (actually this one will be the one deciding whether I do the integration or not)

Pros:

- User is agnostic to the async/sync implementation. If you are in asyncio world, just use `async fn()` and if not `fn()`. Both will work
- There is compatibility between classes using this approach
- No duplication of code

I haven't thought yet about async context managers, iterations and so but I guess there is a way to fix that too (or not, I have no idea).

One fun part of all this is if its possible (meaning easily) to reuse also the tests to test both the sync and the async version... :rolling_eyes:








On Fri, Jun 9, 2017 at 9:52 PM Yarko Tymciurak <yarkot1@gmail.com> wrote:
...so I really am enjoying the conversation.

Guido - re: "vision too far out":  yes, for people trying to struggle w/ async support in their libraries, now... but that is also part of my motivation.   Python 5?  Sure...  (I may have to watch it come to use from the grave, but hopefully not... ;-) ).  Anyway, from back-porting and tactical "implement now" concerns, to plans for next release, to plans for next version of python, to brainstorming much less concrete future versions - all are an interesting continuum.

Re:  GIL... sure, sort of, and sort of not.  I was thinking "as long as major changes are going on...  think about additional structural changes..."   More to the point:  as I see it, people have a hard time thinking about async in the cooperative-multitasking (CMT) sense, and thus disappointments happen around blocking (missed, or unexpects, e.g. hardware failures).   Cory (in his reply - and, yeah: nice writeup!) hints to what I generally structurally like:

"...we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around"

Structurally,  async is light-weight overhead compared to threads, which are lightweight compared to processes, and so a sort of natural app flow seems from lightest-weight, on out.  To me, this seems practical for making life easier for developers, because you can imagine "promoting" an async task caught unexpectedly blocking, to a thread, while still having the lightest-weight loop have control over it (promotion out, as well as cancellation while promoted).

As for multiple task loops, or loops off in a thread, I haven't thought about it too much, but this seems like nothing new nor unreasonable.  I'm thinking of the base-stations we talk over in our mobile connections, which are multiple diskless servers, and hot-promote to "master" server status on hardware failure (or live capacity upgrade, i.e. inserting processors).  This pattern seems both reasonable and useful in this context, i.e. the concept of a master loop (which implies communication/control channels - a complication).  With some thought, some reasonable ground rules and simplifications, and I would expect much can be done.

Appreciate the discussions!

- Yarko
On Fri, Jun 9, 2017 at 1:23 PM, Guido van Rossum <guido@python.org> wrote:
Great write-up! I actually find the async nature of HTTP (both versions) a compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly like it would make the implementation easier; for HTTP/2 it sounds like it would just be better for the user-side as well (if the user just wants one resource they can safely continue to use the synchronous HTTP/1.1 version of the API.)

On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory@lukasa.co.uk> wrote:

On 9 Jun 2017, at 17:28, Guido van Rossum <guido@python.org> wrote:

At least one of us is still confused. The one-event-loop-per-thread model is supported in asyncio without passing the loop around explicitly. The get_event_loop() implementation stores all its state in thread-locals instance, so it returns the thread's event loop. (Because this is an "advanced" model, you have to explicitly create the event loop with new_event_loop() and make it the default loop for the thread with set_event_loop().)

Aha, ok, so the confused one is me. I did not know this. =) That definitely works a lot better. It admittedly works less well if someone is doing their own custom event loop stuff, but that’s probably an acceptable limitation up until the time that Python 2 goes quietly into the night.

All in all, I'm a bit curious why you would need to use asyncio at all when you've got a thread per request anyway.

Yeah, so this is a bit of a diversion from the original topic of this thread but I think it’s an idea worth discussing in this space. I want to reframe the question a bit if you don’t mind, so shout if you think I’m not responding to quite what you were asking. In my understanding, the question you’re implicitly asking is this:

"If you have a thread-safe library today (that is, one that allows users to do threaded I/O with appropriate resource pooling and management), why move to a model built on asyncio?”

There are many answers to this question that differ for different libraries with different uses, but for HTTP libraries like urllib3 here are our reasons.

The first is that it turns out that even for HTTP/1.1 you need to write something that amounts to a partial event loop to properly handle the protocol. Good HTTP clients need to watch for responses while they’re uploading body data because if a response arrives during that process body upload should be terminated immediately. This is also required for sensibly handling things like Expect: 100-continue, as well as spotting other intermediate responses and connection teardowns sensibly and without throwing exceptions.

Today urllib3 does not do this, and it has caused us pain, so our v2 branch includes a backport of the Python 3 selectors module and a hand-written partially-complete event loop that only handles the specific cases we need. This is an extra thing for us to debug and maintain, and ultimately it’d be easier to just delegate the whole thing to event loops written by others who promise to maintain them and make them efficient.

The second answer is that I believe good asyncio support in libraries is a vital part of the future of this language, and “good” asyncio support IMO does as little as possible to block the main event loop. Running all of the complex protocol parsing and state manipulation of the Requests stack on a background thread is not cheap, and involves a lot of GIL swapping around. We have found several bug reports complaining about using Requests with largish-numbers of threads, indicating that our big stack of Python code really does cause contention on the GIL if used heavily. In general, having to defer to a thread to run *Python* code in asyncio is IMO a nasty anti-pattern that should be avoided where possible. It is much less bad to defer to a thread to then block on a syscall (e.g. to get an “async” getaddrinfo), but doing so to run a big big stack of Python code is vastly less pleasant for the main event loop.

For this reason, we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around. This goes doubly so when you consider the other reasons for wanting to use asyncio.

The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a *highly* concurrent protocol. Connections send a lot of control frames back and forth that are invisible to the user working at the semantic HTTP level but that nonetheless need relatively low-latency turnaround (e.g. PING frames). It turns out that in the traditional synchronous HTTP model urllib3 only gets access to the socket to do work when the user calls into our code. If the user goes a “long” time without calling into urllib3, we take a long time to process any data off the connection. In the best case this causes latency spikes as we process all the data that queued up in the socket. In the worst case, this causes us to lose connections we should have been able to keep because we failed to respond to a PING frame in a timely manner.

My experience is that purely synchronous libraries handling HTTP/2 simply cannot provide a positive user experience. HTTP/2 flat-out *requires* either an event loop or a dedicated background thread, and in practice in your dedicated background thread you’d also just end up writing an event loop (see answer 1 again). For this reason, it is basically mandatory for HTTP/2 support in Python to either use an event loop or to spawn out a dedicated C thread that does not hold the GIL to do the I/O (as this thread will be regularly woken up to handle I/O events).

Hopefully this (admittedly horrifyingly long) response helps illuminate why we’re interested in asyncio support. It should be noted that if we find ourselves unable to get it in the short term we may simply resort to offering an “async” API that involves us doing the rough equivalent of running in a thread-pool executor, but I won’t be thrilled about it. ;)

Cory 



--
--Guido van Rossum (python.org/~guido)

_______________________________________________
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/