[Web-SIG] Request for Comments on upcoming WSGI Changes
armin.ronacher at active-4.com
Tue Sep 22 10:29:33 CEST 2009
Alan Kennedy schrieb:
> So, if nobody implements that, then why are we trying to standardise it?
I think that was just one of the ideas that were discussed.
Just to sum it up a bit where we already went:
- my initial plan was going bytes everywhere. Turns out, on Python 3
this is nearly impossible to do because the majority of the standard
library went an unicode path, even where bytes would be more
appropriate (like cgi.FieldStorage, urllib.parse etc.)
- Graham, Robert (and now me as well) try to get charset guessing for
URLs going, decide on latin1 for the HTTP headers. latin1 could be
re-decoded by the application if it really thinks it wanted utf-8
for instance. (Like cookie headers, only I guess only there)
- One idea is enforcing unicode for all Python versions
- One idea is going unicode for Python 3 and bytestrings for Python 2
- New (and old) discussions bring up the surrogate escapes.
So it's quite hard to follow because different people talk about
different ideas at the same time. And so far none of them looks really
> Is there a real need out there?
In python 3, yes. Because the stdlib no longer works with bytes and the
bytes object has few string semantics left.
> Which is a worthy goal, IMHO. Java has been there since the very
> start, since java strings have always been unicode. Take a look at the
> java docs for HttpServlet: no methods return bytes/bytearrays.
And people appear to have problems with that, because what they are
doing is using a specified charset that is by default iso-8859-1:
> Java programmers just tolerate this, although they may curse the
> developers of the servlet spec for not having solved their specific
> problem for them.
Many Java apps are also still using latin1 only or have all kinds of
problems with charsets.
More information about the Web-SIG