[Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO
Massimo Di Pierro
mdipierro at cs.depaul.edu
Wed Sep 23 05:05:49 CEST 2009
I really like your proposal.
On Sep 22, 2009, at 9:22 PM, Ian Bicking wrote:
> OK, I mentioned this in the last thread, but... I can't keep up with
> all this discussion, and I bet you can't either.
> So, here's a rough proposal for WSGI and unicode:
> I propose we switch primarily to "native" strings: str on both
> Python 2 and 3.
> environ keys: native
> environ CGI values: native
> wsgi.* (that is text): native
> response status: native
> response headers: native
> wsgi.input remains byte-oriented, as does the response app_iter.
> I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Instead
> we have:
> wsgi.path_info (I'm not entirely set on these names)
> These both form the original path. It is not URL decoded, so it
> should be ASCII. (I believe non-ASCII could be rejected by the
> server, with Bad Request? A server could also choose to treat it as
> UTF8 or Latin1 and encode unsafe characters to make it ASCII) Thus
> to re-form the URL, you do:
> environ['wsgi.url_scheme'] + '://' + environ['HTTP_HOST'] +
> environ['wsgi.script_name'] + environ['wsgi.path_info'] + '?' +
> All incoming headers will be treated as Latin1. If an application
> suspects another encoding, it is up to the application to transcode
> the header into another encoding. The transcoded value should not
> be put into the environ. In most cases headers should be ASCII, and
> Latin1 is simply a fallback that allows all bytes to be represented
> in both Python 2 and 3.
> Similarly all outgoing headers will be Latin1. Thus if you (against
> good sense) decide to put UTF8 into a cookie, you can do:
> The server will then decode the text as latin1, sending the UTF8
> bytes. This is lame, but non-ASCII in headers is lame. It would be
> preferable to do:
> This sends different text, but is highly preferable. If you wanted
> to parse a cookie that was set as UTF8, you'd do:
> Again, it would be better to do;
> Other variables like environ['wsgi.url_scheme'],
> environ['CONTENT_TYPE'], etc, will be native strings. A Python 3
> hello work app will then look like:
> def hello_world(environ):
> return ('200 OK', [('Content-type', 'text/html; charset=utf8')],
> ['Hello World!'.encode('utf8')])
> start_response and changes to wsgi.input are incidental to what I'm
> proposing here (except that wsgi.input will be bytes); we can decide
> about themseparately.
> Outstanding issues:
> Well, the biggie: is it right to use native strings for the environ
> values, and response status/headers? Specifically, tricks like the
> latin1 transcoding won't work in Python 2, but will in Python 3. Is
> this weird? Or just something you have to think about when using
> the two Python versions?
> What happens if you give unicode text in the response headers that
> cannot be encoded as Latin1?
> Should some things specifically be ASCII? E.g., status.
> Should some things be unicode on Python 2?
> Is there a common case here that would be inefficient?
> Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Web-SIG