[Web-SIG] WSGI, Python 3 and Unicode

James Y Knight foom at fuhm.net
Fri Dec 7 21:53:03 CET 2007


On Dec 7, 2007, at 2:55 PM, Phillip J. Eby wrote:

> * When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047)  (Open question: are there any
> CGI or WSGI variables that should NOT be strings?)

A WSGI gateway should *not* decode headers using RFC 2047. It actually  
*cannot*, without knowing the structure of that particular header,  
because only TEXT tokens are encoded that way. In addition, I know of  
nobody who actually implements RFC 2047 decoding of http header  
values...nothing really uses it. (of course I don't know of all  
implementations out there.)


On Dec 7, 2007, at 3:24 PM, Ian Bicking wrote:

> I believe that SCRIPT_NAME/PATH_INFO would be UTF8 encoded, not  
> latin1.
>  That is, after you urldecode the values (as WSGI asks you to do)
> proper conversion to text is to decode it as UTF8.

Surely not! URLs aren't always utf-8 encoded, only often.

James




More information about the Web-SIG mailing list