[Web-SIG] Stuff left to be done on WSGI

Sat Aug 28 18:56:35 CEST 2004

At 11:51 PM 8/27/04 -0500, Ian Bicking wrote:
>I don't know if we need deeper hierarchy than that.  E.g., 
>web.wsgi.cgiadapter.  I don't think so.  I'd rather "WSGI" be a term only 
>those in the know use -- it means nothing unless you expand the acronym, 
>and even then it's pretty vague.  Ultimately I hope most web programmers 
>just don't need to think about any of it.

Flat is better than nested; let's not mix other projects into this.  The 
WSGI stuff will have enough content to deserve a package of its own, and we 
don't want it to be dependent upon a bunch of "next generation" stuff 
that's not even designed yet.

>Yes, you are right.  Which means the catcher has to keep track of the 
>headers that were sent if it hopes to do anything.  In that case, it might 
>check for text/html or text/plain; if not those two, then just stop the 
>response short and log the error.  If so, and if configured to show 
>errors, then it could display them; cgitb goes to some length to make HTML 
>render correctly.
>
>That makes me think that wrapping send_response is more reasonable. Though 
>it makes error resolution in servers more complex.

I'm not sure I follow you.  The error handling in the server would look 
just like the handling in middleware, no?  In fact, this potentially sounds 
like a job for another boilerplate function in wsgi.util, or perhaps a 
class.  I imagine we might have an AbstractWSGIServer that defines basic 
start-response, write, and other operations, with abstract methods for 
sending/receiving data to and from the client, and various overrideable 
methods for policy.  The simple WSGIServer and CGI gateway would both 
derive from it, or perhaps delegate to it.

>>Here are the methods:
>>as_string, __str__ -- format the message as a string
>>is_multipart -- returns true if payload has been set to a list
>
>Can you do this with HTTP?  I know some MIME stuff works (like 
>content-disposition: attachment; filename=blah).  Would this work too? In 
>a meaningful way?  The cgi module has some weird MIME stuff in it that I 
>don't think any web client has ever exercised.

The as_string/__str__ aren't really useful for HTTP, because they include 
the payload, and optionally a "unix from" line.  They'd only be useful in 
debugging, just to dump out some info.

>>get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, 
>>get_charsets, walk -- stuff for manipulating parts of the message we 
>>don't care about.
>
>Yes.  If these accidentally are used, will it effect the as_string 
>representation?

Yes, which is why we don't need/care about them.

>>set_charset/get_charset -- sets the character set parameters of the 
>>content-type, which is actually useful.  On the down side, setting the 
>>character set sets MIME-Version, but it also sets the 
>>Content-Transfer-Encoding, so it doesn't force the server to default one.
>
>Would that start opening up the possibility of accepting Unicode to 
>write()/app_iter?

In my view, no, because then we'd force the server to know about every 
possible encoding the client and app can come up with.  If the app uses 
this, it should handle the encoding.  We might want to include a utility 
routine or two to pull what the client accepts out of HTTP_ACCEPT et al.

>>__len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, 
>>get, keys, values, items -- case-insensitive dictionary-like interface 
>>(i.e., the stuff we mainly want)
>>get_all -- all values for a header name
>>add_header, replace_header -- more stuff we want
>
>Very good, though not hard to reimplement.

But why should everybody reimplement it, if we're not going to be in the 
stdlib till 2005?

>Okay, looking through the code briefly, I can't help but think that all 
>the complex parts are parts we don't care about.

Not so; content-type parameter setting is quite handy.  For example, if 
you're doing multipart push, you'll need e.g. set_boundary and get_boundary 
might also be useful.

>>Well, to some extent we have to look at the question of what should 
>>happen in those circumstances anyway, whether we solve the problem in 
>>that specific way or not.  Because if the application *does* call 
>>start_response more than once, the server has to be able to handle it 
>>*somehow*.  Really, the ultimate error handling *has* to be done by 
>>servers, unless they want to take the route of crashing the entire 
>>process when something bad happens.  :)
>
>Good question.  I think servers should consider that an error, but they 
>should handle that error gracefully.  Which probably means keeping a "has 
>send_response already been called" flag.
>
>Now, if I could get access to that flag from middleware... and maybe 
>access to the headers and status that have already been sent... (and 
>really, why not?  We aren't worried about streaming headers like we are 
>about bodies)

You dodged my question...  what are you going to *do* with that?  Because 
we need to formulate sensible error handling policies for the general case, 
including things like an I/O error due to the client disconnecting.

Here are possible loci of error:

    * Before start_response is called (application error)
    * During start_response (server error or application error
    * After start_response, before first write  (application error)
    * During a write (server error or application error)
    * Between writes, before return (application error)
    * After return/during iteration (application error)
    * During a post-return write (server error or application error)
    * During 'close()' (application error)

The reason those are "server or application" is because start_response and 
write can fail due to bad data passed by the application, so it's really an 
application error in that case.  The server might fail for some other 
reason, of course, like a lost client connection.

One issue here is that an application or middleware error handler needs to 
know whether the error is the application's or the server's.  It makes no 
sense for a failed write to cause a middleware error handler to attempt to 
write some more data!  It seems we need an error parameter like:

    environ['wsgi.fatal_errors'] = SomeExceptionClass1, SomeExceptionClass2,...

Such that one would use:

    try:
        # invoke child application, etc.
    except environ['wsgi.fatal_errors']:
        raise
    except:
        # regular error handling here

In other words, an application or middleware component should abort if it 
receives one of these exception types.  I'm inclined to think that 
application WSGI programming errors should be treated as fatal: if the app 
sends bad parameters to start_response or write, there's little point in 
proceeding further.