[Web-SIG] A more Twisted approach to async apps in WSGI

Thu Sep 23 03:56:36 CEST 2004

Hi all.  I've been away for a few days due to loss of e-mail service when 
my dedicated server lost a hard drive.  Unfortunately my ISP didn't support 
the OS version any more, so I had to rebuild everything for the new OS version.

Anyway, on to the topic of my post.  Should 'wsgi.input' become an 
iterator?  Or should we develop a different API for asynchronous applications?

On the positive side of the iterator approach, it could make it easier for 
asynchronous applications to pause waiting for input, and it could in 
principle support "chunked" transfer encoding of the input stream.

However, since we last discussed this, I did some Googling on CGI and 
chunked encoding.  By far and away, the most popular links regarding 
chunked encoding and CGI, are all about bugs in IIS and Apache leading to 
various vulnerabilities when chunked encoding is used.  :(

Once you get past those items (e.g. by adding "-IIS -vulnerability" to your 
search), you then find *our* discussion here on the Web-SIG!  Finally, 
digging further, I found some 1998 discussion from the IPP (Internet 
Printing Protocol!) mailing list about what HTTP/1.1 servers support 
chunked encoding for CGI and which don't.

Anyway, the long and short of it is that CGI and chunked encoding are quite 
simply incompatible, which means that relying on its availability would be 
nonportable in a WSGI application anyway.

That leaves the asynchronous use case, but the benefit is rather strained 
at that point.  Many frameworks reuse the 'cgi' module's 'FieldStorage' 
class in order to parse browser input, and the 'cgi' module's 
implementation requires an object with a 'readline()' method.  That means 
that if we switch from an input stream to an iterator, a lot of people are 
going to be trying to make sensible wrappers to convert the iterator back 
to an input stream, and that's just getting ridiculous, especially since in 
many cases the server or gateway has a file-like object to start with.

So, I'm thinking we should shift the burden to an async-specific API.  But, 
in this case, "burden" means that we get to give asynchronous apps an API 
much more suited to their use cases.

Suppose that we did something similar to 'wsgi.file_wrapper'?  That is, 
suppose we had an optional extension that a server could provide, to wrap 
specialized application object(s) in a fashion that then provides backward 
compatibility to the spec?

That is, suppose we had a 'wsgi.async_wrapper', used like this:

     if 'wsgi.async_wrapper' in environ:
         controller=environ['wsgi.async_wrapper'](environ)
         # do stuff with controller, like register its
         # methods as callbacks
         return controller

The idea is that this would create an iterator that the server/gateway 
could recognize as "special", similar to the file-wrapper trick.  But, the 
object returned would provide an extra API for use by the asynchronous 
application, maybe something like:

     put(data) -- queue data for retrieval when the controller is iterated over

     finish() -- mark the iterator finished, so it raises StopIteration

     on_get(length,callback) -- call 'callback(data)' when 'length' bytes 
are available on 'wsgi.input' (but return immediately from the 'on_get()' call)

While this API is an optional extension, it seems it would be closer to 
what some async fans wanted, and less of a kludge.  It won't do away with 
the possibility that middleware might block waiting for input, of course, 
but when no middleware is present or the middleware isn't transforming the 
input stream, it should work out quite well.

In any case, the implementation of the methods and the iterator interface 
are pretty straightforward, either for synchronous or asynchronous servers.

What do y'all think?  I'd especially like feedback from Twisted folk, as to 
whether this looks anything like the right kind of API for async apps.  (I 
expect it will need some tweaking and tuning.)

But if this is the overall right approach, I'd like to drop the current 
proposals to make 'wsgi.input' an iterator and add optional 
'pause'/'resume' APIs, since they were rather kludgy compared to giving 
async apps their own mini-API for nonblocking I/O.

Comments?  Questions?