[Web-SIG] A more Twisted approach to async apps in WSGI

James Y Knight foom at fuhm.net
Tue Oct 5 06:52:54 CEST 2004


A bit late with the response...but better late than never I hope. ;)

On Sep 22, 2004, at 9:56 PM, Phillip J. Eby wrote:
> On the positive side of the iterator approach, it could make it easier 
> for asynchronous applications to pause waiting for input, and it could 
> in principle support "chunked" transfer encoding of the input stream.
>
> Anyway, the long and short of it is that CGI and chunked encoding are 
> quite simply incompatible, which means that relying on its 
> availability would be nonportable in a WSGI application anyway.

I do not find that a good reason to copy the mistake (not supporting 
chunking) to a new API.

However! I don't think that the file-like-object API even has a problem 
with chunked incoming data. As long as WSGI does not make 
CONTENT_LENGTH a required header, and as long as the result of read 
looks different for "more data still to come" and "data finished" (it 
does, blocking for more data to occur vs. returning ''), I think it 
should be fine (for non-async apps). Am I missing something here?

> [...] That means that if we switch from an input stream to an 
> iterator, a lot of people are going to be trying to make sensible 
> wrappers to convert the iterator back to an input stream, and that's 
> just getting ridiculous, [...]

Iterable input stream does seems like it may be a loser for the common 
case.

> So, I'm thinking we should shift the burden to an async-specific API.  
> But, in this case, "burden" means that we get to give asynchronous 
> apps an API much more suited to their use cases.
> [...]
> The idea is that this would create an iterator that the server/gateway 
> could recognize as "special", similar to the file-wrapper trick.  But, 
> the object returned would provide an extra API for use by the 
> asynchronous application, maybe something like:
>
>     put(data) -- queue data for retrieval when the controller is 
> iterated over
>
>     finish() -- mark the iterator finished, so it raises StopIteration
>
>     on_get(length,callback) -- call 'callback(data)' when 'length' 
> bytes are available on 'wsgi.input' (but return immediately from the 
> 'on_get()' call)
>
> While this API is an optional extension, it seems it would be closer 
> to what some async fans wanted, and less of a kludge.  It won't do 
> away with the possibility that middleware might block waiting for 
> input, of course, but when no middleware is present or the middleware 
> isn't transforming the input stream, it should work out quite well.

That sounds okay. I'd specify that the on_get "length" bit is a hint, 
and may or may not be honored. put/finish is the right API for output 
(although I'd call it write/finish myself), and on_get seems like the a 
fairly usable API for input. It doesn't let you pause the incoming 
data, so if you're passing it on to a slow downstream you'll 
potentially need to buffer a lot, but maybe that's too much to ask for. 
I assume callback('') is used to indicate end of incoming data: that 
should be specified.

However, interaction with middleware seems quite tricky here:
- For input modifying middleware: I guess on_get would have to just 
raise an exception if wsgi.input has been replaced. If the input stream 
was iterable, an on_get callback could just be considered notice that 
you can iterate the input stream once without blocking, assuming the 
block boundary requirements were also in effect here. Then it would 
work right even if the input stream was replaced. However, I think it 
might be the case that middleware that wants to modify the input stream 
is so rare, it doesn't really matter.
- Output. The block boundary section implies that middleware that 
follows the guidelines, and doesn't do any blocking operations of its 
own should work without worrying about the server and application being 
async or sync. If this is to work, the server cannot expect to actually 
receive an asyncwrapper iterable as the return value, even if the app 
is using it, because the middleware might be consuming that iterable 
and returning one of its own. This means the .put/.next methods should 
communicate out-of-band, effectively calling pause/resume functions in 
the server so it knows when it's safe to iterate the vanilla iterator 
the middleware returned without the middleware blocking when calling 
the asyncwrapper-iterator.

> But if this is the overall right approach, I'd like to drop the 
> current proposals to make 'wsgi.input' an iterator and add optional 
> 'pause'/'resume' APIs, since they were rather kludgy compared to 
> giving async apps their own mini-API for nonblocking I/O.

Perhaps Peter Hunt could try to implement it in his twisted wsgi 
gateway and see if it works out. :)

James



More information about the Web-SIG mailing list