[Web-SIG] WSGI 2.0

Fri Mar 30 20:14:57 CEST 2007

At 12:42 PM 3/30/2007 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>I was thinking of something a bit simpler; the environ key would be an 
>>object that, when called, tells the server that it's okay to resume 
>>iteration attempts on the application.  A sort of "put me back on the 
>>queue for iteration" call.  The callback would have to be safe to call 
>>from any thread at any time, and must not re-enter anything, just 
>>re-enable iteration.
>
>OK, that makes sense.  So there's something like 
>environ['wsgi.server_resume'] in the environment, and the app yields 
>something that indicates a pause, then calls that value to undo the pause?

Yep.  I guess we should distinguish here between "pause but poll" and 
"pause and wait for the callback".  i.e., the operations might be something 
like:

PAUSE_AND_POLL
PAUSE_AND_WAIT
FLUSH

>>>>Ideally, this should be done in a way that's easy for middleware to 
>>>>handle; a flush signal should be handled by the middleware *and* passed 
>>>>up the chain, while any other async signals would be passed directly up 
>>>>the chain (unless it's something like "pause for input" and the 
>>>>middleware controls the input).
>>>>If we do this right, it should be easier to write middleware that works 
>>>>correctly with respect to buffering, since the issues of flushing and 
>>>>pausing now become explicit rather than implicit.  (This should make it 
>>>>easier to teach/learn as well.)
>>>
>>>In terms of buffering, I can't think of many cases where it would 
>>>matter.  Either the middleware passes back the response with no changes, 
>>>or it needs to consume the entire response body (and probably headers 
>>>and maybe status) to do whatever transformation it needs to do.
>>>
>>>Things like pauses and async signals would ideally be passed upstream, 
>>>but flushes and content would all be consumed by the middleware.
>>I can't think of any condition where middleware would *not* pass all of 
>>these up to its caller.  In the case of a "flush", it needs to first 
>>yield any buffered output, but it *must* still yield the flush.
>
>Is there any use to this?  If you are transforming output, the flush is 
>unlikely to flush anything; all output will be buffered.

That depends on whether the transformation is of a streaming nature.  If 
you're talking about things that e.g. apply XSL or some such, those are 
probably really MFCs rather than true middleware, and it's okay for an MFC 
to have more constraints on its wrapped application than transparent 
middleware does.

>>For example, if you're doing server push, then the app should yield a 
>>flush prior to each new content boundary.  If the middleware is doing 
>>compression or some such, then it needs to restart encoding after each 
>>content boundary, as well as flush the prior encoded output.
>
>I suppose server push is the only place where flush really matters, and 
>most output transformations will simply break server push.

More precisely, they should just not apply their transformations to a 
multipart content type, unless they know how to handle it.

However, there is another place where flow control matters, and that is 
streaming files which are too large to practically buffer in memory.  Such 
files need a way to "suggest" that they be split into smaller blocks.

Having a requirement that flow control be passed through allows us to 
ensure that middleware doesn't try to consume the whole response, you see.

In WSGI 1.0, we handle this by treating *every* block as if it were 
followed by a flush, but in 2.0 I'd like to accomodate the fact that many 
people seem to think that yielding is like using "print" in CGI.

I'm not married to the specific mechanism we use, but I *would* like to see 
WSGI 2.0 make it easy for middleware authors to comply in such a way as to 
handle streaming and push correctly.

Hm.  Maybe what we need is a way to specify the *type* of response, so that 
middleware can ignore what it can't handle...  e.g.:

    def simple_app(environ)
        return resp_type, status, headers, content

Then if the response type is STREAM or ASYNC, the middleware could opt out 
of it, returning the response as-is.

OTOH, adding an extra return value seems like a pain when so few 
applications would use it, and so little middleware would care.  Maybe it 
would be better to add something to the start of the status string, 
instead?  E.g. "if status.startswith('!'): return original_response"?

>   As long as the async signals are easy to detect (e.g., an integer or 
> tuple) then that's fine.
>
>--
>Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
>             | Write code, do good | http://topp.openplans.org/careers