[Web-SIG] WSGI start_response exc_info argument

Wed Apr 6 03:26:00 CEST 2005

At 03:51 PM 4/5/05 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>>But I don't mind all of that, because it is only contained in the error 
>>>catching middleware and no where else.  I have other middleware that 
>>>overrides start_response, and don't want to bother with all the exc_info 
>>>in that case.
>>
>>Just pass it through to the upstream start_response; the top-level server 
>>is the only one that needs to care.
>>
>>>   And a lot of the logic -- like trying to show errors even when 
>>> there's been a partial response -- is just work, there's no way to get 
>>> around it.
>>
>>So leave it to the server.  All I'm saying is that there is no need to 
>>track whether the response has started.  It's the server's job to know 
>>that, and the opinion of middleware doesn't count here.  As long as the 
>>*server* hasn't sent the headers yet, you can restart the response.
>
>My concern is mostly that it is error-prone to leave it to the server, 
>because it's not something you can pass upward easily (AFAICT).

I don't understand.  If you want to implement an in-stream recovery 
middleware, you certainly can.  (But even then you don't need to track the 
state; if start_response() raises an error, you know the server above you 
has already sent the headers.  So you can always trap the error from 
start_response in order to *know* that you need in-stream recovery.)

And for anything that's *not* in-stream recovery middleware, you shouldn't 
care; just call start_response with exc_info and proceed about your 
business.  If there is an upstream handler, it will throw exc_info back at 
you if need be, then catch the error after it breaks out of your code.

The purpose of exc_info is to simplify application-level error handlers; 
they just pass the exc_info and proceed as they would for pre-stream 
recovery.  If pre-stream recovery is impossible, the error handler will be 
aborted and the server (or error-handling middleware) gets to take over.

>   I know my middleware is mostly not compliant with this part of the 
> spec, and it's not even clear to me how I'd fix them all.  I'm sure I 
> could figure it out, but most of WSGI doesn't require deep thought (and I 
> like that), and this part doesn't feel like that to me.

I think you're over-analyzing it and there isn't anything complex except 
the case of an in-stream handler, which is inherently complex due to the 
task.  But there's nothing stopping you writing middleware that lies to its 
downstream application when called with exc_info, by *not* re-throwing 
exc_info but instead attempting recovery.  Technically, this is against the 
letter of the spec, which says that if HTTP headers have been output you 
must abort.  (Although this is then loosened in the Error Handling section 
to say that error-handling middleware can just return without an exception.)

I personally still believe, however, that leaving middleware out of 
in-streaam recovery is by far the best course of action, because a good 
framework will buffer its output for the majority of human-readable pages, 
so in-stream recovery is only needed for streaming data or large files, 
where *only the application* knows what the safe way to recover 
is!  Therefore, having middleware attempt in-stream recovery is IMO 
inherently unsafe, unless it is tuned for precisely that particular 
application, amounting to little more than a monkey patch for that specific 
scenario.

To put it another way, if you think you need this, it's probably because 
the application isn't buffering properly.  In the common case, a WSGI 
application *should* be sending its output as a single block.  (See 
http://www.python.org/peps/pep-0333.html#buffering-and-streaming for details.)

>I'm trying to outsmart the servers, because I want to be able to control 
>the error handling independent of servers.  I'm trying to advocate that 
>servers be as dumb as possible, and I expect to trust them as little as 
>possible, so I don't want to leave stuff up to them.  And showing partial 
>responses is just Hard -- all the more reason to avoid leaving it up to 
>servers with all the implementations that exist.

Right; partial responses are hard, so don't do them except in *application* 
code.  99% of application output should be buffered, so in-stream recovery 
is irrelevant and useless.

>Well, I guess the idea is to let the error middleware do its thing, but 
>give the server an option to bail out gracefully if necessary (by raising 
>the exception passed in).  I think it's actually reasonable to have the 
>server bail out ungracefully -- or the middleware -- in those few cases 
>where there's a conflict.

It's allowed to; the spec just says it *should* raise exc_info, but it's 
allowed to raise something else.

>   It mostly only applies to cases where there's errors in the streamed 
> output, which seems unlikely to me (at least in cases where there's 
> interactive debugging via a web browser).

Right, it's only for errors in streamed output that the exc_info argument 
can even be used; apart from that scenario it's a total red herring.

>Now that I'm thinking about it, can you remind me why WSGI doesn't work 
>like this:
>
>status, headers, body_iter = application(environ)
>print status
>print headers...
>for block in body_iter: ...
>body_iter.close()
>
>
>Why is there a start_response and then a separate return?

One reason is that it allows you to write an application as a 
generator.  But more importantly, it's necessary in order to support 
'write()' for backward compatibility with existing frameworks, and that's 
pretty much the "killer reason" it's structured how it is.  This particular 
innovation was Tony Lownds' brainchild, though, not mine.  In my original 
WSGI concept, the application received an output stream and just wrote 
headers and everything to it.