[Web-SIG] A more Twisted approach to async apps in WSGI
Phillip J. Eby
pje at telecommunity.com
Thu Sep 23 03:56:36 CEST 2004
Hi all. I've been away for a few days due to loss of e-mail service when
my dedicated server lost a hard drive. Unfortunately my ISP didn't support
the OS version any more, so I had to rebuild everything for the new OS version.
Anyway, on to the topic of my post. Should 'wsgi.input' become an
iterator? Or should we develop a different API for asynchronous applications?
On the positive side of the iterator approach, it could make it easier for
asynchronous applications to pause waiting for input, and it could in
principle support "chunked" transfer encoding of the input stream.
However, since we last discussed this, I did some Googling on CGI and
chunked encoding. By far and away, the most popular links regarding
chunked encoding and CGI, are all about bugs in IIS and Apache leading to
various vulnerabilities when chunked encoding is used. :(
Once you get past those items (e.g. by adding "-IIS -vulnerability" to your
search), you then find *our* discussion here on the Web-SIG! Finally,
digging further, I found some 1998 discussion from the IPP (Internet
Printing Protocol!) mailing list about what HTTP/1.1 servers support
chunked encoding for CGI and which don't.
Anyway, the long and short of it is that CGI and chunked encoding are quite
simply incompatible, which means that relying on its availability would be
nonportable in a WSGI application anyway.
That leaves the asynchronous use case, but the benefit is rather strained
at that point. Many frameworks reuse the 'cgi' module's 'FieldStorage'
class in order to parse browser input, and the 'cgi' module's
implementation requires an object with a 'readline()' method. That means
that if we switch from an input stream to an iterator, a lot of people are
going to be trying to make sensible wrappers to convert the iterator back
to an input stream, and that's just getting ridiculous, especially since in
many cases the server or gateway has a file-like object to start with.
So, I'm thinking we should shift the burden to an async-specific API. But,
in this case, "burden" means that we get to give asynchronous apps an API
much more suited to their use cases.
Suppose that we did something similar to 'wsgi.file_wrapper'? That is,
suppose we had an optional extension that a server could provide, to wrap
specialized application object(s) in a fashion that then provides backward
compatibility to the spec?
That is, suppose we had a 'wsgi.async_wrapper', used like this:
if 'wsgi.async_wrapper' in environ:
controller=environ['wsgi.async_wrapper'](environ)
# do stuff with controller, like register its
# methods as callbacks
return controller
The idea is that this would create an iterator that the server/gateway
could recognize as "special", similar to the file-wrapper trick. But, the
object returned would provide an extra API for use by the asynchronous
application, maybe something like:
put(data) -- queue data for retrieval when the controller is iterated over
finish() -- mark the iterator finished, so it raises StopIteration
on_get(length,callback) -- call 'callback(data)' when 'length' bytes
are available on 'wsgi.input' (but return immediately from the 'on_get()' call)
While this API is an optional extension, it seems it would be closer to
what some async fans wanted, and less of a kludge. It won't do away with
the possibility that middleware might block waiting for input, of course,
but when no middleware is present or the middleware isn't transforming the
input stream, it should work out quite well.
In any case, the implementation of the methods and the iterator interface
are pretty straightforward, either for synchronous or asynchronous servers.
What do y'all think? I'd especially like feedback from Twisted folk, as to
whether this looks anything like the right kind of API for async apps. (I
expect it will need some tweaking and tuning.)
But if this is the overall right approach, I'd like to drop the current
proposals to make 'wsgi.input' an iterator and add optional
'pause'/'resume' APIs, since they were rather kludgy compared to giving
async apps their own mini-API for nonblocking I/O.
Comments? Questions?
More information about the Web-SIG
mailing list