[Web-SIG] The rewritten WSGI pre-PEP
Ian Bicking
ianb at colorstudy.com
Wed Aug 11 21:20:23 CEST 2004
Phillip J. Eby wrote:
>> I've had constant problems trying to backtrack through middleware
>> (like mod_rewrite) to figure out how to create a URL that is internal
>> to the application. I'd like to keep around some artifact indicating
>> what the original URI was (e.g., REQUEST_URI); something that
>> middleware specifically should not rewrite. Nor is there any real
>> reason for it to be rewritten.
>
>
> Hm. And SCRIPT_NAME is insufficient for this? I think I can see why
> mod_rewrite would make this a problem, but ISTM that Python middleware
> component could do rewrites that left SCRIPT_NAME "logically correct".
I suppose it could, i.e., http:// + SERVER_NAME + ":" + SERVER_PORT +
SCRIPT_NAME + PATH_INFO + "?" + QUERY_STRING is the complete URL. If
that's the expectation, then that too should be in the spec. But, if
only because of the existance of mod_rewrite, that's not likely to be
true. REQUEST_URI just seems like a natural part of the request
description -- it says exactly what the client asked for, without the
extra meaning that SCRIPT_NAME and PATH_INFO have.
In the end I've come to dislike mod_rewrite because of these issues, but
given its existance...
>> SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and
>> should just be passed through any middleware.
>
>
> Are you sure? SERVER_ADDR might be different if the request is
> forwarded to another machine, mightn't it? I seem to recall that
> mod_backhand does some stuff with this. In any case it highlights the
> trouble with trying to precisely pin down things that are already
> inherently implementation-defined. Unfortunately, WSGI isn't really
> going to eliminate all the environment introspecting and munging code
> that lives in the various existing apps and frameworks today.
If SERVER_ADDR needs to be rewritten, then SERVER_NAME would be
rewritten at the same time.
I think I've also seen some inconsistencies of SERVER_NAME and
HTTP_HOST. SERVER_NAME tends to be the canonical name of the host,
ignoring any named virtual hosts (at least in Apache). So really if you
are going to construct a URL it should use (environ.get("HTTP_HOST") or
environ.get("SERVER_NAME")).
Maybe it would be good to include how the URL is supposed to be split
up, at least informationally. Like, you can reconstruct the URL by doing:
if environ.get('HTTPS') == 'on':
url = 'https://'
else:
url = 'http://'
if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']
if environ.get('HTTPS') == 'on':
if environ['SERVER_PORT'] != '443'
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']
url += environ['SCRIPT_NAME']
url += environ.get('PATH_INFO', '')
if environ.get('QUERY_STRING'):
url += '?' + environ['QUERY_STRING']
This should never fail (no missing keys), and should always be accurate
except for details like a ? without a query string, or an explicit port
that matches the default, or a server may optionally normalize the path.
If it can't be accurate -- e.g., because SCRIPT_NAME or PATH_INFO have
been muddled (or even QUERY_STRING) -- then I'd like to have a
REQUEST_URI which is accurate.
--
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Web-SIG
mailing list