manuel miranda
Mon Jun 12 17:20:17 EDT 2017

So, I've been playing a bit with the information I saw in this thread
(thank you all for the responses) and I got something super simple working:

What I like about this (and that's what I was aiming for) is that the user
uses the same class/interface no matter if its inside asyncio world or not.
So both `await fn()` and `fn()` work producing the expected results.

Now some cons (that in the case of my library are acceptable):

- This aims only for asyncio compatibility, other async frameworks like
trio, curio, etc. wouldn't work
- No python2 compatibility (although Nathaniel's idea of bleaching could
still be applied)
- I guess it adds some overhead to both sync and async versions, I will do
some benchmarking when I have time (actually this one will be the one
deciding whether I do the integration or not)


- User is agnostic to the async/sync implementation. If you are in asyncio
world, just use `async fn()` and if not `fn()`. Both will work
- There is compatibility between classes using this approach
- No duplication of code

I haven't thought yet about async context managers, iterations and so but I
guess there is a way to fix that too (or not, I have no idea).

One fun part of all this is if its possible (meaning easily) to reuse also
the tests to test both the sync and the async version... :rolling_eyes:

On Fri, Jun 9, 2017 at 9:52 PM Yarko Tymciurak <yarkot1 at gmail.com> wrote:

> ...so I really am enjoying the conversation.
> Guido - re: "vision too far out":  yes, for people trying to struggle w/
> async support in their libraries, now... but that is also part of my
> motivation.   Python 5?  Sure...  (I may have to watch it come to use from
> the grave, but hopefully not... ;-) ).  Anyway, from back-porting and
> tactical "implement now" concerns, to plans for next release, to plans for
> next version of python, to brainstorming much less concrete future versions
> - all are an interesting continuum.
> Re:  GIL... sure, sort of, and sort of not.  I was thinking "as long as
> major changes are going on...  think about additional structural
> changes..."   More to the point:  as I see it, people have a hard time
> thinking about async in the cooperative-multitasking (CMT) sense, and thus
> disappointments happen around blocking (missed, or unexpects, e.g. hardware
> failures).   Cory (in his reply - and, yeah: nice writeup!) hints to what I
> generally structurally like:
> "...we’d ideally treat asyncio as the first-class citizen and retrofit on
> the threaded support, rather than the other way around"
> Structurally,  async is light-weight overhead compared to threads, which
> are lightweight compared to processes, and so a sort of natural app flow
> seems from lightest-weight, on out.  To me, this seems practical for making
> life easier for developers, because you can imagine "promoting" an async
> task caught unexpectedly blocking, to a thread, while still having the
> lightest-weight loop have control over it (promotion out, as well as
> cancellation while promoted).
> As for multiple task loops, or loops off in a thread, I haven't thought
> about it too much, but this seems like nothing new nor unreasonable.  I'm
> thinking of the base-stations we talk over in our mobile connections, which
> are multiple diskless servers, and hot-promote to "master" server status on
> hardware failure (or live capacity upgrade, i.e. inserting processors).
> This pattern seems both reasonable and useful in this context, i.e. the
> concept of a master loop (which implies communication/control channels - a
> complication).  With some thought, some reasonable ground rules and
> simplifications, and I would expect much can be done.
> Appreciate the discussions!
> - Yarko
> On Fri, Jun 9, 2017 at 1:23 PM, Guido van Rossum <guido at python.org> wrote:
>> Great write-up! I actually find the async nature of HTTP (both versions)
>> a compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly
>> like it would make the implementation easier; for HTTP/2 it sounds like it
>> would just be better for the user-side as well (if the user just wants one
>> resource they can safely continue to use the synchronous HTTP/1.1 version
>> of the API.)
>> On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
>>> On 9 Jun 2017, at 17:28, Guido van Rossum <guido at python.org> wrote:
>>> At least one of us is still confused. The one-event-loop-per-thread
>>> model is supported in asyncio without passing the loop around explicitly.
>>> The get_event_loop() implementation stores all its state in thread-locals
>>> instance, so it returns the thread's event loop. (Because this is an
>>> "advanced" model, you have to explicitly create the event loop with
>>> new_event_loop() and make it the default loop for the thread with
>>> set_event_loop().)
>>> Aha, ok, so the confused one is me. I did not know this. =) That
>>> definitely works a lot better. It admittedly works less well if someone is
>>> doing their own custom event loop stuff, but that’s probably an acceptable
>>> limitation up until the time that Python 2 goes quietly into the night.
>>> All in all, I'm a bit curious why you would need to use asyncio at all
>>> when you've got a thread per request anyway.
>>> Yeah, so this is a bit of a diversion from the original topic of this
>>> thread but I think it’s an idea worth discussing in this space. I want to
>>> reframe the question a bit if you don’t mind, so shout if you think I’m not
>>> responding to quite what you were asking. In my understanding, the question
>>> you’re implicitly asking is this:
>>> "If you have a thread-safe library today (that is, one that allows users
>>> to do threaded I/O with appropriate resource pooling and management), why
>>> move to a model built on asyncio?”
>>> There are many answers to this question that differ for different
>>> libraries with different uses, but for HTTP libraries like urllib3 here are
>>> our reasons.
>>> The first is that it turns out that even for HTTP/1.1 you need to write
>>> something that amounts to a partial event loop to properly handle the
>>> protocol. Good HTTP clients need to watch for responses while they’re
>>> uploading body data because if a response arrives during that process body
>>> upload should be terminated immediately. This is also required for sensibly
>>> handling things like Expect: 100-continue, as well as spotting other
>>> intermediate responses and connection teardowns sensibly and without
>>> throwing exceptions.
>>> Today urllib3 does not do this, and it has caused us pain, so our v2
>>> branch includes a backport of the Python 3 selectors module and a
>>> hand-written partially-complete event loop that only handles the specific
>>> cases we need. This is an extra thing for us to debug and maintain, and
>>> ultimately it’d be easier to just delegate the whole thing to event loops
>>> written by others who promise to maintain them and make them efficient.
>>> The second answer is that I believe good asyncio support in libraries is
>>> a vital part of the future of this language, and “good” asyncio support IMO
>>> does as little as possible to block the main event loop. Running all of the
>>> complex protocol parsing and state manipulation of the Requests stack on a
>>> background thread is not cheap, and involves a lot of GIL swapping around.
>>> We have found several bug reports complaining about using Requests with
>>> largish-numbers of threads, indicating that our big stack of Python code
>>> really does cause contention on the GIL if used heavily. In general, having
>>> to defer to a thread to run *Python* code in asyncio is IMO a nasty
>>> anti-pattern that should be avoided where possible. It is much less bad to
>>> defer to a thread to then block on a syscall (e.g. to get an “async”
>>> getaddrinfo), but doing so to run a big big stack of Python code is vastly
>>> less pleasant for the main event loop.
>>> For this reason, we’d ideally treat asyncio as the first-class citizen
>>> and retrofit on the threaded support, rather than the other way around.
>>> This goes doubly so when you consider the other reasons for wanting to use
>>> asyncio.
>>> The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is
>>> a *highly* concurrent protocol. Connections send a lot of control frames
>>> back and forth that are invisible to the user working at the semantic HTTP
>>> level but that nonetheless need relatively low-latency turnaround (e.g.
>>> PING frames). It turns out that in the traditional synchronous HTTP model
>>> urllib3 only gets access to the socket to do work when the user calls into
>>> our code. If the user goes a “long” time without calling into urllib3, we
>>> take a long time to process any data off the connection. In the best case
>>> this causes latency spikes as we process all the data that queued up in the
>>> socket. In the worst case, this causes us to lose connections we should
>>> have been able to keep because we failed to respond to a PING frame in a
>>> timely manner.
>>> My experience is that purely synchronous libraries handling HTTP/2
>>> simply cannot provide a positive user experience. HTTP/2 flat-out
>>> *requires* either an event loop or a dedicated background thread, and in
>>> practice in your dedicated background thread you’d also just end up writing
>>> an event loop (see answer 1 again). For this reason, it is basically
>>> mandatory for HTTP/2 support in Python to either use an event loop or to
>>> spawn out a dedicated C thread that does not hold the GIL to do the I/O (as
>>> this thread will be regularly woken up to handle I/O events).
>>> Hopefully this (admittedly horrifyingly long) response helps illuminate
>>> why we’re interested in asyncio support. It should be noted that if we find
>>> ourselves unable to get it in the short term we may simply resort to
>>> offering an “async” API that involves us doing the rough equivalent of
>>> running in a thread-pool executor, but I won’t be thrilled about it. ;)
>>> Cory
