[Web-SIG] WSGI, Python 3 and Unicode

Phillip J. Eby pje at telecommunity.com
Fri Dec 7 20:55:47 CET 2007


So here are my recommendations so far for the addendum to WSGI *1.0* 
for Python 3.0 (I expect we can be more strict for WSGI 2.0):

* When running under Python 3, applications SHOULD produce bytes 
output and headers

* When running under Python 3, servers and gateways MUST accept 
strings as application output or headers, under the existing rules 
(i.e., s.encode('latin-1') must convert the string to bytes without 
an exception)

* When running under Python 3, servers MUST provide CGI HTTP 
variables as strings, decoded from the headers using HTTP standard 
encodings (i.e. latin-1 + RFC 2047)  (Open question: are there any 
CGI or WSGI variables that should NOT be strings?)

* When running under Python 3, servers MUST make wsgi.input a binary 
(byte) stream

* When running under Python 3, servers MUST provide a text stream for 
wsgi.errors

These rules are intended to simplify the porting of existing 
code.  Notice, for example, that these rules allow middleware to pass 
strings through unchanged, since they are not required to produce 
bytes output or headers.

Unfortunately, wsgi.input can't be coded around, but for most 
frameworks this should be a single point of pain.  In fact, if the 
'cgi' stdlib module is made compatible with bytes, only the rare 
framework that rolls its own multipart parser or otherwise directly 
manipulates put/post data will be affected.  Code that just takes the 
input and writes it to a file won't be bothered, either.

Comments or questions?



More information about the Web-SIG mailing list