[Web-SIG] Python 3.0 and WSGI 1.0.
pje at telecommunity.com
Sat May 9 00:00:47 CEST 2009
At 10:37 AM 5/8/2009 -0700, Robert Brewer wrote:
>It also explicitly states that "HTTP does not directly support Unicode,
>and neither does this interface. All encoding/decoding must be handled
>by the application; all strings passed to or from the server must be
>standard Python BYTE STRINGS (emphasis mine), not Unicode objects. The
>result of using a Unicode object where a string object is required, is
It also says what the interpretation is when 'str' is a unicode string type.
>PEP 333 is difficult to interpret because it uses the name "str"
>synonymously with the concept "byte string", which Python 3000 defies. I
>believe the intent was to differentiate unicode from bytes, not elevate
>whatever type happens to be called "str" on your Python du jour. It was
>and is a mistake to standardize on type names ("str") across platforms
>and not on type behavior ("byte string").
Ironically, 'str' is what's consistent in type behavior; the bytes
type doesn't supply the same operations.
>If Python3 WSGI apps emit unicode strings (py3k type 'str'), you're
>effectively saying the server will always call
>"chunk.encode('latin-1')". That negates any benefit of using unicode as
>the type for the response. That's not "supporting unicode"; that's using
>unicode exactly as if it were an opaque byte string. That's seems silly
>to me when there is a perfectly useful byte string type.
Compatibility sometimes demands we do silly things. Personally, I
think it's kind of silly that Python 3 files return incompatible data
types depending on what mode you open them in, but there's not a
whole lot we can do about that.
Meanwhile, existing WSGI code ported to Python 3 is going to yield
strings until/unless manually converted; AFAIK 2to3 has no way to
automatically detect WSGI-ness and convert your strings to bytes.
>I don't see any benefit to that.
There isn't any benefit to doing it by *hand*. However, backward
compatibility demands that servers *accept* such strings, as they may
be generated by legacy apps.
That's why the Python 3 WSGI amendments say servers MUST accept this,
even thought applications SHOULD supply bytes.
That is, for new code, we do want bytes. What we don't want, ever,
is unicode characters above #255 in any unicode strings sent as part
of the response body.
More information about the Web-SIG