[Python-Dev] PEP 3333: wsgi_string() function

Raymond Hettinger raymond.hettinger at gmail.com
Fri Jan 7 01:00:27 CET 2011


Can you please take a look at
http://docs.python.org/dev/whatsnew/3.2.html#pep-3333-python-web-server-gateway-interface-v1-0-1
to see if it accurately recaps the resolution of the WSGI text/bytes issues.
I would appreciate any feedback, as it is likely that the whatsnew
document will be most people's first chance to hear the outcome
of the multi-year discussion.

Thanks,


Raymond


On Jan 6, 2011, at 3:50 PM, And Clover wrote:

> On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
>> What is this horrible encoding "bytes-as-unicode"?
> 
> It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
> is the encoding specified by the HTTP RFC, as well as having the happy
> property of preserving every input byte. PEP 3333 requires it.
> 
>> os.environ is supposed to be correctly decoded and contain valid
> unicode characters.
> 
> It is not possible to ‘correctly’ decode to unicode for os.environ
> because that decoding happens long before the web application (the
> only party that knows what encoding should be in use) gets a look in.
> 
> Maybe the web application is using UTF-8, maybe it's using cp1252,
> but if we let the server/gateway decide and do that decoding
> before the application can do anything about it, we will get the wrong
> encoding in *many* cases and the result will be permanent, unrecoverable
> mangling of non-ASCII characters in submitted headers.
> 
>> If WSGI uses another encoding than the locale encoding (which is a bad
> idea),
> 
> It's an absolutely necessary idea. The locale encoding is nothing to do
> with the web application's encoding. Windows applications need to be
> able to use UTF-8 (which is never the ANSI code page), and web
> applications in general need to be deployable to any server without
> having to worry about the server's locale.
> 
> The locale-dependent status quo is that non-ASCII characters in URL
> paths and other HTTP headers don't work for Python apps.
> 
> The recoding dances present in wsgiref's CGIHandler for 3.2 are
> distasteful but completely necessary to normalise differences in
> encodings used by various servers and platforms to generate their CGI
> environment.
> 
>> it should use os.environb and decodes keys and values using its
>> own encoding.
> 
> Well yes, but:
> 
> (a) os.environb doesn't exist in previous Python 3.1, making it
> impossible to implement WSGI before 3.2;
> (b) a byte environment on Windows would have to be encoded
> from the Unicode environment, with a server-specific encoding,
> and then what encoding are you going to choose for the variables
> that contain non-HTTP-sourced native Unicode strings (such as,
> very commonly, Windows pathnames)?
> 
> The bytes-or-bytes-in-Unicode argument is something that has been
> bounced around Web-SIG for literally *years*; this is what we ended up
> with. Although I personally like bytes, frankly, a re-run of this
> argument *again* whilst WSGI remains in perpetual stalemate does not
> appeal. WSGI and wsgiref in Python 3.0-3.1 simply does not work. This
> has long been an embarrassing situation for what is supposed to be a
> leading
> web language. Let us not perpetuate this sorry story to 3.2 as well.
> 
> -- 
> And Clover
> mailto:and at doxdesk.com http://www.doxdesk.com
> skype:uknrbobince gtalk:chat?jid=bobince at gmail.com
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110106/a4e31736/attachment-0001.html>


More information about the Python-Dev mailing list