[Web-SIG] WSGI and asyncio (tulip)?

Tue Oct 14 18:47:35 CEST 2014

I am fascinated by the new WSGI - HTTP/2 discussions. I don't have much to
contribute, because my own experience with web development is either very
old (when CGI was new and exciting) or uses corporate frameworks where
there's a huge set of layers between the app code and the external network
(e.g. Google, Dropbox).

I have strong emotional responses to some of the discussion topics (e.g. I
feel that REMOTE_ADDR should represent the public IP address from which the
request originated, not the internal address of a reverse proxy, since my
use cases for it are all about blocking or rate-limiting specific clients,
and I assume the network between the reverse proxy and the app server is
secure) but I am sure there are already enough voices here and I trust that
the sig will come up with the right answers (even if they override my gut
feelings!).

My most recent foray into web stuff was writing a small web crawler as an
example for asyncio (PEP 3156, a.k.a. tulip). The crawler is written in
Python 3 and the source code is here:
https://github.com/aosabook/500lines/blob/master/crawler/crawling.py and it
supports several advanced HTTP features: TLS, connection reuse, chunked
transfer encoding, redirects (but not compression -- I think it would be
straightforward to add, but the code would then exceed the 500 lines limit
imposed by the book).

Perhaps the main lesson I learned from writing this is how different yet
similar web code looks when you use an asynchronous framework. Which makes
me wonder -- can there be a future where you can write your web app as an
asyncio coroutine?

It looks like the WSGI protocol already supports asynchronously producing
the response (using yield), and I don't think asyncio would have a problem
with converting this convention to its own "yield from" convention.
However, the situation is different for reading the request body. The app
can read this in dribs and drabs if it wants to, by reading limited amounts
of data from environ['wsgi.input'], but in an asyncio-driven world reading
operations should really be marked with "yield from" (or perhaps just yield
-- again, I'm sure an adaptor won't have a problem with this).

I'm wondering if a small extension to the WSGI protocol might be sufficient
to support this: the special environ variable "wsgi.async_input" could
optionally be tied to a standard asyncio stream reader (
https://docs.python.org/3/library/asyncio-stream.html#streamreader), from
which bytes can be read using "yield from stream.read([nbytes])" or "yield
from stream.readline()".

Thinking a little more about this, it might be better if an async app could
be a regular asyncio coroutine. In this case it might be better if
start_response() were to return an asyncio stream writer (
https://docs.python.org/3/library/asyncio-stream.html#streamwriter) and if
it was expected to produce all its output by writing to this stream.

Anyway, I think I'm getting ahead of myself, but I do think it would be
nice if the next WSGI standard supported asyncio. For older Python versions
it could then instead support Trollius (
https://pypi.python.org/pypi/trollius), a backport of asyncio that supports
Python 2.7 and 3.2 (and newer).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20141014/c2cb452b/attachment.html>