[Async-sig] async/sync library reusage

Fri Jun 9 14:23:53 EDT 2017

Great write-up! I actually find the async nature of HTTP (both versions) a
compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly
like it would make the implementation easier; for HTTP/2 it sounds like it
would just be better for the user-side as well (if the user just wants one
resource they can safely continue to use the synchronous HTTP/1.1 version
of the API.)

On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory at lukasa.co.uk> wrote:

>
> On 9 Jun 2017, at 17:28, Guido van Rossum <guido at python.org> wrote:
>
> At least one of us is still confused. The one-event-loop-per-thread model
> is supported in asyncio without passing the loop around explicitly. The
> get_event_loop() implementation stores all its state in thread-locals
> instance, so it returns the thread's event loop. (Because this is an
> "advanced" model, you have to explicitly create the event loop with
> new_event_loop() and make it the default loop for the thread with
> set_event_loop().)
>
>
> Aha, ok, so the confused one is me. I did not know this. =) That
> definitely works a lot better. It admittedly works less well if someone is
> doing their own custom event loop stuff, but that’s probably an acceptable
> limitation up until the time that Python 2 goes quietly into the night.
>
> All in all, I'm a bit curious why you would need to use asyncio at all
> when you've got a thread per request anyway.
>
>
> Yeah, so this is a bit of a diversion from the original topic of this
> thread but I think it’s an idea worth discussing in this space. I want to
> reframe the question a bit if you don’t mind, so shout if you think I’m not
> responding to quite what you were asking. In my understanding, the question
> you’re implicitly asking is this:
>
> "If you have a thread-safe library today (that is, one that allows users
> to do threaded I/O with appropriate resource pooling and management), why
> move to a model built on asyncio?”
>
> There are many answers to this question that differ for different
> libraries with different uses, but for HTTP libraries like urllib3 here are
> our reasons.
>
> The first is that it turns out that even for HTTP/1.1 you need to write
> something that amounts to a partial event loop to properly handle the
> protocol. Good HTTP clients need to watch for responses while they’re
> uploading body data because if a response arrives during that process body
> upload should be terminated immediately. This is also required for sensibly
> handling things like Expect: 100-continue, as well as spotting other
> intermediate responses and connection teardowns sensibly and without
> throwing exceptions.
>
> Today urllib3 does not do this, and it has caused us pain, so our v2
> branch includes a backport of the Python 3 selectors module and a
> hand-written partially-complete event loop that only handles the specific
> cases we need. This is an extra thing for us to debug and maintain, and
> ultimately it’d be easier to just delegate the whole thing to event loops
> written by others who promise to maintain them and make them efficient.
>
> The second answer is that I believe good asyncio support in libraries is a
> vital part of the future of this language, and “good” asyncio support IMO
> does as little as possible to block the main event loop. Running all of the
> complex protocol parsing and state manipulation of the Requests stack on a
> background thread is not cheap, and involves a lot of GIL swapping around.
> We have found several bug reports complaining about using Requests with
> largish-numbers of threads, indicating that our big stack of Python code
> really does cause contention on the GIL if used heavily. In general, having
> to defer to a thread to run *Python* code in asyncio is IMO a nasty
> anti-pattern that should be avoided where possible. It is much less bad to
> defer to a thread to then block on a syscall (e.g. to get an “async”
> getaddrinfo), but doing so to run a big big stack of Python code is vastly
> less pleasant for the main event loop.
>
> For this reason, we’d ideally treat asyncio as the first-class citizen and
> retrofit on the threaded support, rather than the other way around. This
> goes doubly so when you consider the other reasons for wanting to use
> asyncio.
>
> The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a
> *highly* concurrent protocol. Connections send a lot of control frames back
> and forth that are invisible to the user working at the semantic HTTP level
> but that nonetheless need relatively low-latency turnaround (e.g. PING
> frames). It turns out that in the traditional synchronous HTTP model
> urllib3 only gets access to the socket to do work when the user calls into
> our code. If the user goes a “long” time without calling into urllib3, we
> take a long time to process any data off the connection. In the best case
> this causes latency spikes as we process all the data that queued up in the
> socket. In the worst case, this causes us to lose connections we should
> have been able to keep because we failed to respond to a PING frame in a
> timely manner.
>
> My experience is that purely synchronous libraries handling HTTP/2 simply
> cannot provide a positive user experience. HTTP/2 flat-out *requires*
> either an event loop or a dedicated background thread, and in practice in
> your dedicated background thread you’d also just end up writing an event
> loop (see answer 1 again). For this reason, it is basically mandatory for
> HTTP/2 support in Python to either use an event loop or to spawn out a
> dedicated C thread that does not hold the GIL to do the I/O (as this thread
> will be regularly woken up to handle I/O events).
>
> Hopefully this (admittedly horrifyingly long) response helps illuminate
> why we’re interested in asyncio support. It should be noted that if we find
> ourselves unable to get it in the short term we may simply resort to
> offering an “async” API that involves us doing the rough equivalent of
> running in a thread-pool executor, but I won’t be thrilled about it. ;)
>
> Cory
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20170609/49426c23/attachment.html>