[Web-SIG] WSGI, Python 3 and Unicode
James Y Knight
foom at fuhm.net
Fri Dec 7 21:53:03 CET 2007
On Dec 7, 2007, at 2:55 PM, Phillip J. Eby wrote:
> * When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047) (Open question: are there any
> CGI or WSGI variables that should NOT be strings?)
A WSGI gateway should *not* decode headers using RFC 2047. It actually
*cannot*, without knowing the structure of that particular header,
because only TEXT tokens are encoded that way. In addition, I know of
nobody who actually implements RFC 2047 decoding of http header
values...nothing really uses it. (of course I don't know of all
implementations out there.)
On Dec 7, 2007, at 3:24 PM, Ian Bicking wrote:
> I believe that SCRIPT_NAME/PATH_INFO would be UTF8 encoded, not
> latin1.
> That is, after you urldecode the values (as WSGI asks you to do)
> proper conversion to text is to decode it as UTF8.
Surely not! URLs aren't always utf-8 encoded, only often.
James
More information about the Web-SIG
mailing list