[Web-SIG] String Types in WSGI [Graham's WSGI for py3]

Fri Sep 18 14:06:48 CEST 2009

Hi,

Let me backup a bit here.

We have to focus on two difference use cases for WSGI on Python 3.  The
one is the application that should continue to work on Python 3, the
other one is the application that was designed for Python 3.

In both cases let's just assume that this application is using
WebOb/Werkzeug/Django or whatever library is in use.

2to3 converts "foo" and u"foo" to "foo".  However in Python 3 "foo" is
unicode, so that's fine if the library exposes unicode data only.  This
is the case for all the frameworks and libraries.  Template engines,
database adapters, frameworks, they all use unicode internally which is
great.

If the WSGI server figures out charsets or the library, the data
forwarded to the application is always unicode.  So what would we gain
from doing the decoding in the server?

On the bright side, 2to3 would probably start working for some raw WSGI
applications but would still break many.  On the other hand, the
frameworks would still have to perform encoding detection for stuff like
multipart or form encoded form data.  Even worse: they would have to
apply different decode rules for form data and stuff like path info.

It already caused confusion that path info was unquoted in the past with
many people quoting that value, it would be even worse in the future if
path info was proper unicode, query string looked like unicode but is
actually url encoded data with a different encoding etc.  I can see some
major confusion coming up there, and it would not remove any complexity
for real-world implementations of WSGI.

Regards,
Armin