[Web-SIG] Stuff left to be done on WSGI

Mon Aug 30 06:12:18 CEST 2004

Phillip J. Eby wrote:
> At 11:51 PM 8/27/04 -0500, Ian Bicking wrote:
> 
>> I don't know if we need deeper hierarchy than that.  E.g., 
>> web.wsgi.cgiadapter.  I don't think so.  I'd rather "WSGI" be a term 
>> only those in the know use -- it means nothing unless you expand the 
>> acronym, and even then it's pretty vague.  Ultimately I hope most web 
>> programmers just don't need to think about any of it.
> 
> 
> Flat is better than nested; let's not mix other projects into this.  The 
> WSGI stuff will have enough content to deserve a package of its own, and 
> we don't want it to be dependent upon a bunch of "next generation" stuff 
> that's not even designed yet.

Will it really?  And how will it be organized?  There's some utility 
functions, which don't deserve a module.  There's WSGIHTTPServer, based 
on BaseHTTPServer.  And maybe some CGI WSGI server.

I imagine other things could come along, but not right away, and where 
would they go?  Added to some top-level module?  A new module?

I also *really* dislike the name wsgi for a module.  It's a fine name 
for discussing this, but I'm really opposed to it becoming a name used 
more widely.  Not because I think there's a better name, but because the 
function is important and the name isn't.  One of the things we can do 
if this is an approved PEP is that we don't have to qualify this as 
one-of-many, using a distinguishing name.

>> Yes, you are right.  Which means the catcher has to keep track of the 
>> headers that were sent if it hopes to do anything.  In that case, it 
>> might check for text/html or text/plain; if not those two, then just 
>> stop the response short and log the error.  If so, and if configured 
>> to show errors, then it could display them; cgitb goes to some length 
>> to make HTML render correctly.
>>
>> That makes me think that wrapping send_response is more reasonable. 
>> Though it makes error resolution in servers more complex.
> 
> 
> I'm not sure I follow you.  The error handling in the server would look 
> just like the handling in middleware, no?  In fact, this potentially 
> sounds like a job for another boilerplate function in wsgi.util, or 
> perhaps a class.  I imagine we might have an AbstractWSGIServer that 
> defines basic start-response, write, and other operations, with abstract 
> methods for sending/receiving data to and from the client, and various 
> overrideable methods for policy.  The simple WSGIServer and CGI gateway 
> would both derive from it, or perhaps delegate to it.

To me that feels like it makes implementation more complicated, rather 
than less.  Maybe not really, but I think it will *feel* more 
complicated.  I think a good example is more helpful to authors.  All 
these issues are very much part of the control flow, and abstracting 
control flow leads (IMHO) to confusing class structures.

>>> set_charset/get_charset -- sets the character set parameters of the 
>>> content-type, which is actually useful.  On the down side, setting 
>>> the character set sets MIME-Version, but it also sets the 
>>> Content-Transfer-Encoding, so it doesn't force the server to default 
>>> one.
>>
>>
>> Would that start opening up the possibility of accepting Unicode to 
>> write()/app_iter?
> 
> 
> In my view, no, because then we'd force the server to know about every 
> possible encoding the client and app can come up with.  If the app uses 
> this, it should handle the encoding.  We might want to include a utility 
> routine or two to pull what the client accepts out of HTTP_ACCEPT et al.

Python seems to be pretty good at dealing with a lot of different 
encodings.  A lot of work on this has gone into the base Python 
distribution -- I don't think there's any better source of code on encoding.

It opens up a big can of worms, so I don't mind ignoring encoding, but 
maybe that's just because I'm American and I'm lazy and usually ignore 
encoding, so it's mysterious to me.

>>> __len__, __getitem__, __setitem__, __delitem__, __contains__, 
>>> has_key, get, keys, values, items -- case-insensitive dictionary-like 
>>> interface (i.e., the stuff we mainly want)
>>> get_all -- all values for a header name
>>> add_header, replace_header -- more stuff we want
>>
>>
>> Very good, though not hard to reimplement.
> 
> 
> But why should everybody reimplement it, if we're not going to be in the 
> stdlib till 2005?

Well, if we already have utility functions, this is just a utility 
class.  And it would be a very small and easy to understand.  Smaller 
and easier to understand than email.Message, certainly, and with no 
distracting vestigal pieces.

>> Okay, looking through the code briefly, I can't help but think that 
>> all the complex parts are parts we don't care about.
> 
> 
> Not so; content-type parameter setting is quite handy.  For example, if 
> you're doing multipart push, you'll need e.g. set_boundary and 
> get_boundary might also be useful.
> 
> 
>>> Well, to some extent we have to look at the question of what should 
>>> happen in those circumstances anyway, whether we solve the problem in 
>>> that specific way or not.  Because if the application *does* call 
>>> start_response more than once, the server has to be able to handle it 
>>> *somehow*.  Really, the ultimate error handling *has* to be done by 
>>> servers, unless they want to take the route of crashing the entire 
>>> process when something bad happens.  :)
>>
>>
>> Good question.  I think servers should consider that an error, but 
>> they should handle that error gracefully.  Which probably means 
>> keeping a "has send_response already been called" flag.
>>
>> Now, if I could get access to that flag from middleware... and maybe 
>> access to the headers and status that have already been sent... (and 
>> really, why not?  We aren't worried about streaming headers like we 
>> are about bodies)
> 
> 
> You dodged my question...  what are you going to *do* with that?  
> Because we need to formulate sensible error handling policies for the 
> general case, including things like an I/O error due to the client 
> disconnecting.

Well, in some cases I would try to display errors to the client.  Though 
maybe a class of errors -- particularly those that happen during the 
iteration phase, or after start_response -- could just go to a log. 
OTOH, I'd want to show *some* indication to the client that an error has 
occured, and the response is incomplete, at least for human-readable 
content (text/html and maybe text/plain).

But not in all cases, like I/O error.  OTOH, I might log errors *only* 
when I couldn't display them to the client (during development).

> Here are possible loci of error:
> 
>    * Before start_response is called (application error)

Easy to handle.  Display a traceback, or a technical-problems error 
message and log the error.

>    * During start_response (server error or application error

What application errors are you thinking of?  Like invoking 
start_response incorrectly?

Server errors should probably be handled by the server.  It might be 
nice if the server always raised a single exception (say, 
WSGIServerError), so a start_response definition might look like:

def start_response(status, headers):
     try:
         blah blah
     except ServerIOError:
         do something
         raise WSGIServerError

And applications shouldn't catch (or should re-raise) a server error.

>    * After start_response, before first write  (application error)

I'd like the option here to display an error to the client, dependent on 
the content-type.

>    * During a write (server error or application error)

Another WSGIServerError?

>    * Between writes, before return (application error)

Depending on content-type, a last write would be good.

>    * After return/during iteration (application error)

Again, depending on content-type, a last write (well, iteration) would 
be nice.  Less important generally.

>    * During a post-return write (server error or application error)

I'm not sure what you're thinking here?

>    * During 'close()' (application error)

Logged to wsgi.errors, nothing else.

> The reason those are "server or application" is because start_response 
> and write can fail due to bad data passed by the application, so it's 
> really an application error in that case.  The server might fail for 
> some other reason, of course, like a lost client connection.
 >
> One issue here is that an application or middleware error handler needs 
> to know whether the error is the application's or the server's.  It 
> makes no sense for a failed write to cause a middleware error handler to 
> attempt to write some more data!  It seems we need an error parameter like:
> 
>    environ['wsgi.fatal_errors'] = SomeExceptionClass1, 
> SomeExceptionClass2,...
> 
> Such that one would use:
> 
>    try:
>        # invoke child application, etc.
>    except environ['wsgi.fatal_errors']:
>        raise
>    except:
>        # regular error handling here
> 
> In other words, an application or middleware component should abort if 
> it receives one of these exception types.  I'm inclined to think that 
> application WSGI programming errors should be treated as fatal: if the 
> app sends bad parameters to start_response or write, there's little 
> point in proceeding further.

Hmm... that would work too.  Then the type of the exception wouldn't be 
lost, though servers would also be able to encode the type inside a 
single exception.  OTOH, by using a tuple there, you could avoid 
requiring any wsgi module which defines this particular exception.

I would probably call these "server_errors" rather than "fatal_errors", 
though I guess it amounts to the same thing.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org