[Web-SIG] Stuff left to be done on WSGI

Sat Aug 28 05:13:43 CEST 2004

At 07:00 PM 8/27/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>I don't know if it's possible for us to get these items together in time 
>>for 2.4; if we don't, we don't.
>
>I can't imagine we would make it.

You're probably right; it's just so tantalizingly close, as AMK mentioned.

>I would hope that we can come to some consensus and produce something 
>useable before 2.5, with the understanding that it will be included in 
>2.5.  I would kind of like to see a "web" package.

I think we'll have better luck with a 'wsgi' package, but I could be 
wrong.  'web' just seems like a nuisance attractor for all sorts of 
unproductive bickering on so many levels.

On a more immediate practical level, we'd be crazy to try to claim 'web' 
for a third-party package that we want to propose for the stdlib, but a 
package named 'wsgi' would be more than fair game.

>>There's little harm in having a separate 'wsgi' distribution until 2.5 
>>rolls around.  I'm thinking the package should include:
>>  * BaseHTTPServer-based WSGI server
>>  * CGI-based WSGI gateway (run WSGI apps under CGI)
>
>You've noted these are missing error handling.  What kind were you 
>thinking of specifically?
>
>There's exception handling, which seems straight forward.

Well, to be honest, I haven't a clue what one does about errors *after* the 
headers are written.  You can't send anything useful to the client, because 
the status is already set.

If you sent a Content-Length, you can break the connection before that 
point, and it's a fair guess the client will know something's wrong.  If 
you *didn't* send a content length and break the connection, the client 
gets an incomplete file and maybe doesn't know it.  Sending an error 
message once 'write()' has been called will garble the output.

All of these options are especially unsatisfactory when binary files are 
involved, where "unsatisfactory" could mean anything from "annoying" to 
"catastrophic" (e.g. garbling an executable).

>   Spec compliance?  Certainly an anal version of these servers should be 
> written, that checks every type passed around, looks for common mistakes, 
> etc.  I don't know if the anal and the useable version need to be the 
> same thing.

I wasn't even addressing spec compliance, although test suites for all the 
implementations, factored so that they could be used as a basis for testing 
other implementations, would certainly be nice.

>Two models -- one that optimistically tries to load the cgi module in a 
>fake environment (what I did), plus another that actually runs any CGI script.

I'm not following what the difference is, exactly, but I guess we'll need 
to get into the design more.

>If we use email.Message, using a status header seems fine.  If not, I 
>think it should be separate -- I don't want to search a list for the 
>status header.

Right, that's all I was thinking.

>I don't think the utility functions are a big deal at all, and I worry 
>that there's some gotchas to email.Message, specifically where it is 
>intended for email.  So I'm certainly not adamantly opposed to 
>email.Message, but I'm not adamantly for it either.  I'd rather see a 
>superclass of email.Message (such a superclass does not yet exist, but 
>should be easy to write/extract) that is more minimal.

Why don't you take a look at the code?  I have.  Here are the methods:

as_string, __str__ -- format the message as a string

is_multipart -- returns true if payload has been set to a list

get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, 
get_charsets, walk -- stuff for manipulating parts of the message we don't 
care about.

set_charset/get_charset -- sets the character set parameters of the 
content-type, which is actually useful.  On the down side, setting the 
character set sets MIME-Version, but it also sets the 
Content-Transfer-Encoding, so it doesn't force the server to default one.

__len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, get, 
keys, values, items -- case-insensitive dictionary-like interface (i.e., 
the stuff we mainly want)

get_all -- all values for a header name

add_header, replace_header -- more stuff we want

get_type, get_main_type, get_subtype, get_content_type, 
get_content_maintype, get_content_subtype, get_content_subtype, get_param, 
get_params, set_param, del_param, set_type, get_boundary, set_boundary, 
get_content_charset -- miscellaneous content-type analysis and 
manipulation.  Not necessarily very helpful, except maybe for 
middleware.  But they hardly hurt.

get_filename -- extract filename from Content-Disposition if present.  Not 
particularly helpful, but also not damaging in any way.

Perhaps more eyes should look at this, but I haven't found anything in here 
that's damaging or even annoying apart from setting MIME-Version if it's 
not there and the content-type is touched.

>But, I don't know.  I'm still up in the air.  Really, I just don't like 
>wrapping start_response, from a mechanical point of view.  It feels 
>awkward to me.  I wish I could just query the server as to what point in 
>the response it is at.

Well, we could offer a facility for that, but first I'd like to explore 
what error handling should *do* in different situations.

>>The only other thing that comes to mind is requiring servers to support 
>>multiple 'start_response' calls in some way that makes sense for 
>>exception handlers, while requiring it to still work in the case where an 
>>extension API has already been used for output.
>
>That seems too hard.

Well, to some extent we have to look at the question of what should happen 
in those circumstances anyway, whether we solve the problem in that 
specific way or not.  Because if the application *does* call start_response 
more than once, the server has to be able to handle it *somehow*.  Really, 
the ultimate error handling *has* to be done by servers, unless they want 
to take the route of crashing the entire process when something bad 
happens.  :)