[Web-SIG] Move to bless Graham's WSGI 1.1 as official spec
Manlio Perillo
manlio_perillo at libero.it
Thu Dec 3 11:55:51 CET 2009
James Y Knight ha scritto:
> I move to bless mod_wsgi's definition of WSGI 1.1 [1]
> [...]
>
> [1] http://code.google.com/p/modwsgi/wiki/SupportForPython3X
Hi.
Just a few questions.
It is true that HTTP headers can be encoded assuming latin-1; and they
can be encoded using PEP 383.
However what about URI (that is, for PATH_INFO and the like)?
For URI (if I remember correctly) the suggested encoding is UTF-8, so
URLS should be decoded using
url.decode('utf-8', 'surrogateescape')
Is this correct?
Now another question.
Let's consider the `wsgiref.util.application_uri` function
def application_uri(environ):
url = environ['wsgi.url_scheme']+'://'
from urllib.parse import quote
if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']
if environ['wsgi.url_scheme'] == 'https':
if environ['SERVER_PORT'] != '443':
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']
url += quote(environ.get('SCRIPT_NAME') or '/')
return url
There is a potential problem, here, with the quote function.
This function does the following:
def quote(string, safe='/', encoding=None, errors=None):
if isinstance(string, str):
if encoding is None:
encoding = 'utf-8'
if errors is None:
errors = 'strict'
string = string.encode(encoding, errors)
This means that if we use surrogateescape, the informations about
original bytes is lost here.
This can be easily fixed by changing the application_uri function, but
this also means that a WSGI application will not work with Python 3.1.x.
Finally, a question about cookies.
Cookie data SHOULD be transparent to the server/gateway; however WSGI is
going to assume that data is encoded in latin-1.
I don't know what the HTTP/Cookie spec says about this.
However, from a WSGI application point of view, the cookie data can, as
an example, contain some text encoded in UTF-8; this means that the
application must first encode the data:
cookie_bytes = cookie.encode('latin-1', 'surrogateescape')
and then decode it using UTF-8:
my_cookie_data = cookie_bytes.decode('utf-8')
This is a bit unreasonable, but I don't know if this is a common
practice (I do this, just to make an example).
Manlio Perillo
More information about the Web-SIG
mailing list