Re: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI

5 Oct 2004

      At 12:52 AM 10/5/04 -0400, James Y Knight wrote:
...
A bit late with the response...but better late than never I hope. ;)
On Sep 22, 2004, at 9:56 PM, Phillip J. Eby wrote:
...
On the positive side of the iterator approach, it could make it easier 
for asynchronous applications to pause waiting for input, and it could in 
principle support "chunked" transfer encoding of the input stream.
Anyway, the long and short of it is that CGI and chunked encoding are 
quite simply incompatible, which means that relying on its availability 
would be nonportable in a WSGI application anyway.
I do not find that a good reason to copy the mistake (not supporting 
chunking) to a new API.
Perhaps not, but there are also lots of other reasons not to support 
chunked input, mainly that a Google search for "chunked encoding CGI" turns 
up reams of vulnerabilities that suggest existing HTTP implementations may 
leave a bit to be desired with respect to accepting a POST of chunked 
input.  :)
...
However! I don't think that the file-like-object API even has a problem 
with chunked incoming data. As long as WSGI does not make CONTENT_LENGTH a 
required header, and as long as the result of read looks different for 
"more data still to come" and "data finished" (it does, blocking for more 
data to occur vs. returning ''), I think it should be fine (for non-async 
apps). Am I missing something here?
I don't think so.  Although you probably want something more like a pipe 
error if the input times out or the connection is broken.
...
...
So, I'm thinking we should shift the burden to an async-specific API.
But, in this case, "burden" means that we get to give asynchronous apps 
an API much more suited to their use cases.
[...]
The idea is that this would create an iterator that the server/gateway 
could recognize as "special", similar to the file-wrapper trick.  But, 
the object returned would provide an extra API for use by the 
asynchronous application, maybe something like:
put(data) -- queue data for retrieval when the controller is 
iterated over
finish() -- mark the iterator finished, so it raises StopIteration
on_get(length,callback) -- call 'callback(data)' when 'length' bytes 
are available on 'wsgi.input' (but return immediately from the 'on_get()' call)
While this API is an optional extension, it seems it would be closer to 
what some async fans wanted, and less of a kludge.  It won't do away with 
the possibility that middleware might block waiting for input, of course, 
but when no middleware is present or the middleware isn't transforming 
the input stream, it should work out quite well.
That sounds okay. I'd specify that the on_get "length" bit is a hint, and 
may or may not be honored. put/finish is the right API for output 
(although I'd call it write/finish myself),
The reason for not using 'write' is to avoid confusion with the existing 
"write" callable, both in terms of knowing which one we're talking about, 
and in terms of not confusing the semantics, which may differ subtly 
between the two.
...
and on_get seems like the a fairly usable API for input. It doesn't let 
you pause the incoming data,
Actually it does; it's supposed to be a one-shot.  You have to call it 
again if you want to get called back again.
...
so if you're passing it on to a slow downstream you'll potentially need 
to buffer a lot, but maybe that's too much to ask for. I assume 
callback('') is used to indicate end of incoming data: that should be 
specified.
I missed that entirely, but it sounds like a good idea.
...
However, interaction with middleware seems quite tricky here:
- For input modifying middleware: I guess on_get would have to just raise 
an exception if wsgi.input has been replaced.
Yep.  Although it might be that the wrapper would just refuse to 
instantiate in the first place in that circumstance.
...
If the input stream was iterable, an on_get callback could just be 
considered notice that you can iterate the input stream once without 
blocking, assuming the block boundary requirements were also in effect here.
Yes, but this'd only work if the input were an iterator.  input.read() 
returning an empty string would mean EOF, so the boundary stuff doesn't 
work in that case.
...
- Output. The block boundary section implies that middleware that follows 
the guidelines, and doesn't do any blocking operations of its own should 
work without worrying about the server and application being async or 
sync. If this is to work, the server cannot expect to actually receive an 
asyncwrapper iterable as the return value, even if the app is using it, 
because the middleware might be consuming that iterable and returning one 
of its own.
Correct.
...
This means the .put/.next methods should communicate out-of-band, 
effectively calling pause/resume functions in the server so it knows when 
it's safe to iterate the vanilla iterator the middleware returned without 
the middleware blocking when calling the asyncwrapper-iterator.
It could do that, certainly.  But, the truth is it's *always* safe to 
iterate.  Note that the application can just use the on_get callback to set 
a flag that it's ready to continue, and just keep yielding empty strings 
till then.

More to the point, the iterator-wrapper can simply yield empty strings when 
its internal queue is empty, and a sensible async server should back off 
its iterator.next() retry attempts when an application yields empty 
strings.  This is pretty much always safe and sensible.

However, the out-of-band communication you describe can also take place, 
since it provides better communication in the case where the extension is 
available.