[Web-SIG] The rewritten WSGI pre-PEP

Wed Aug 11 19:51:56 CEST 2004

Phillip J. Eby wrote:
>> I'm not of strong opinion, but the callables definitely make it harder 
>> to understand.
> 
> 
> ...but easier to implement, since everything can be done with functions 
> and closures.
> 
> Do you think you would have difficulty creating a conforming 
> implementation, or are you just saying it took you a while to grasp how 
> you would do so?

No, I don't think it would make it any harder to implement.  Mostly it's 
just harder to talk about.

>>> ====================   =============================================
>>> Variable               Value
>>> ====================   =============================================
>>> ``wsgi.version``       The string ``"1.0"``
>>
>>
>> Would it make sense for this to be a tuple, like (1, 0), like 
>> sys.version_info?
> 
> 
> Maybe.  I'm not sure it makes any difference.  I could just as soon drop 
> versioning altogether and just use the presence or absence of feature 
> keys as the means of determining the version.

I think of the version as something of a contract.  The WSGI server 
author can't deny that they intended to implement the full spec if they 
include the version number.  Also it could be used like HTTP 1.1 
sometimes is, like you must include a Host header if you claim to be 
talking 1.1.  Similarly applications could require certain features if 
the server claims to talk, say, WSGI 1.1.

>> Another useful one I brought up last time would be some indication 
>> that the application was definitely not going to be reused, i.e., it's 
>> being invoked in a CGI context.  The performance issues there are 
>> completely different than in other environments.
> 
> Okay...  how about 'wsgi.last_call', which is a true value if this 
> invocation of the application will *probably* be the last?  IOW, the 
> server need not guarantee that the app will *not* be called again; this 
> is just a "suggestion".

Yes, that sounds good.

>> Should there be any policy about path segments containing //, ./, or ../?
> 
> 
> What do you have in mind?

I don't know.  Normalization, perhaps -- remove empty path segments, and 
resolve any relative paths.  Which would mean something like:

path = re.sub(r'/[^/]*/../', '/', path)
path = re.sub(r'/./', '/', path)
path = re.sub(r'//+', '/', path)

I dunno... that should probably be up to the application.

>> Hmm... what should the server do if it gets a Location header with no 
>> Status?
> 
> There's no such thing; there's always a status under this spec.  
> However, what happens to the HTTP headers passed to 'start_response()' 
> could perhaps be made clearer.

Okay, that's fine.  Though any internal redirect would have to be done 
through an extension in that case.  Though in practice internal 
redirects are kind of complicated to deal with anyway.  Lots of linking 
confusion, lost headers, etc.

>> The CGI spec says servers should change the current working directory 
>> to the resource being run.  I think this won't be that common for WSGI 
>> servers, though.
> 
> Do you think this needs to be stated?  WSGI only references CGI with 
> respect to environment variables.

Probably it's no big deal.

>> This is from the CGI spec:
>>
>>    Scripts MUST be prepared to handled URL-encoded values in
>>    metavariables. In addition, they MUST recognise both "+" and
>>    "%20" in URL-encoded quantities as representing the space
>>    character. (See section 3.1.)
>>
>> That seems weird; I've never URL-decoded values besides QUERY_STRING.
> 
> 
> That's probably an addition to the 1.1 spec.  However, ISTM I've seen 
> code in Zope that expects to decode path segments.  I could be wrong.

I would assume in that case it was decoding something that was encoded 
on the server side.  E.g.:

<a href="http://whatever.com/documents/I%2FO%20library">I/O library</a>

As opposed to the CGI gateway encoding any of its values.  Even 
QUERY_STRING is encoded by the browser, not the gateway.  Maybe this is 
just a case of HTTP issues leaking into the CGI spec.

>> The CGI spec doesn't seem to mention REQUEST_URI.  That's surprising. 
>> Here's the Apache CGI variables it doesn't mention:
>>
>> SERVER_SIGNATURE (pretty boring)
>> SERVER_ADDR (seems very basic)
>> DOCUMENT_ROOT (doesn't seem appropriate)
>> SCRIPT_FILENAME (also often not appropriate)
>> SERVER_ADMIN (boring)
>> SCRIPT_URI
>> REQUEST_URI (I don't understand the distinction)
>> REMOTE_PORT (boring, though I guess if you wanted to add an ident 
>> check it would be useful)
>> UNIQUE_ID (not needed)
>>
>>
>> I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially 
>> useful.  SCRIPT_URI and REQUEST_URI might be good.
> 
> 
> Sigh.  I guess maybe I'll have to go back and pick out variables one by 
> one.  However, I don't think *any* of the variables you listed should be 
> required to exist.  For one thing, it's much easier to write middleware 
> if you only have to munge SCRIPT_NAME and PATH_INFO during traversals.

I've had constant problems trying to backtrack through middleware (like 
mod_rewrite) to figure out how to create a URL that is internal to the 
application.  I'd like to keep around some artifact indicating what the 
original URI was (e.g., REQUEST_URI); something that middleware 
specifically should not rewrite.  Nor is there any real reason for it to 
be rewritten.

SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and should 
just be passed through any middleware.  Hmm... the CGI spec also leaves 
out any SSL variables.  Those are, of course, all optional.  But if the 
user connected via SSL, I think HTTPS=on should be required.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org