[Python-Dev] PEP 3333: wsgi_string() function

P.J. Eby pje at telecommunity.com
Tue Jan 4 17:27:53 CET 2011


At 03:44 AM 1/4/2011 +0100, Victor Stinner wrote:
>Hi,
>
>In the PEP 3333, I read:
>--------------
>import os, sys
>
>enc, esc = sys.getfilesystemencoding(), 'surrogateescape'
>
>def wsgi_string(u):
>     # Convert an environment variable to a WSGI "bytes-as-unicode"
>string
>     return u.encode(enc, esc).decode('iso-8859-1')
>
>def run_with_cgi(application):
>     environ = {k: wsgi_string(v) for k,v in os.environ.items()}
>     environ['wsgi.input']        = sys.stdin
>     environ['wsgi.errors']       = sys.stderr
>     environ['wsgi.version']      = (1, 0)
>...
>--------------
>
>What is this horrible encoding "bytes-as-unicode"? os.environ is
>supposed to be correctly decoded and contain valid unicode characters.
>If WSGI uses another encoding than the locale encoding (which is a bad
>idea), it should use os.environb and decodes keys and values using its
>own encoding.
>
>If you really want to store bytes in unicode, str is not the right type:
>use the bytes type and use os.environb instead.

If you want to discuss this, the Web-SIG is the appropriate 
place.  Also, it was the appropriate place months ago, when the final 
decision on the environ encoding was made.  ;-)

IOW, the above change to the PEP is merely fixing the code example to 
be correct for Python 3, where it previously was correct only for 
Python 2.  The PEP itself has already required this since the 
previous revisions, and wsgiref in the stdlib is already compliant 
with the above (although it uses a more sophisticated approach for 
dealing with win32 compatibility).

The rationale for this choice is described in the PEP, and was also 
discussed in the mailing list emails back when the work was being done.

IOW, this particular ship already sailed a long time ago.  In fact, 
for Jython this bytes-as-unicode approach has been the PEP 
333-defined encoding for at least *six years*...  so it's REALLY late 
to complain about it now! ;-)

PEP 3333 is merely a mapping of PEP 333 to allow WSGI apps to be 
ported from Python 2 to Python 3.  There is work in progress on the 
Web-SIG now on PEP 444, which will support only Python 2.6+, where 
'b' literals and the 'bytes' alias are available.  It is as yet 
uncertain what environ encoding will be used, but at the moment I'm 
not convinced that either pure bytes or pure unicode are acceptable 
replacements for the PEP 333-compatible approach.

In any event, that is a discussion for the Web-SIG, not Python-Dev.



More information about the Python-Dev mailing list