[Web-SIG] WSGI, Python 3 and Unicode

Guido van Rossum guido at python.org
Fri Dec 7 01:27:41 CET 2007

On Dec 6, 2007 4:15 PM, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 10:13 AM 12/7/2007 +1100, Graham Dumpleton wrote:
> >Has anyone had any thoughts about how WSGI is going to made to work
> >with Python 3?
> >
> > >From what I understand about changes in Python 3, the main issue seems
> >to be the removal of string type in its current form.
> >
> >This is an issue as WSGI specification currently states that status,
> >header names/values and the items returned by the iterable must all be
> >string instances. This is done to ensure that the application has done
> >any conversions from Unicode, where knowledge about encoding would be
> >known, before being passed to WSGI adapter.
> >
> >In Python 3 the default for string type objects will effectively be
> >Unicode. Is WSGI going to be made to somehow cope with that, or will
> >application instead be required to return byte string objects instead?
> WSGI already copes, actually.  Note that Jython and IronPython have
> this issue today, and see:
> http://www.python.org/dev/peps/pep-0333/#unicode-issues
> """On Python platforms where the str or StringType type is in fact
> Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
> "strings" referred to in this specification must contain only code
> points representable in ISO-8859-1 encoding (\u0000 through \u00FF,
> inclusive). It is a fatal error for an application to supply strings
> containing any other Unicode character or code point. Similarly,
> servers and gateways must not supply strings to an application
> containing any other Unicode characters."""

That may work for IronPython/Jython, where encoded data is represented
by the str type, but it won't be sufficient for Py3k, where encoded
data is represented using the bytes type. IOW, in IronPython/Jython,
u"\u1234".encode('utf-8') returns a str instance: '\xe1\x88\xb4'; but
in Py3k, it returns a bytes instance: b'\xe1\x88\xb4'.

The issue applies to input as well as output -- data read from a
socket is also represented as bytes, unless you're using makefile()
with a text mode and an encoding.

You might want to look at how the unittests for wsgiref manage to pass
in Py3k though. ;-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

More information about the Web-SIG mailing list