[Web-SIG] String Types in WSGI [Graham's WSGI for py3]
armin.ronacher at active-4.com
Fri Sep 18 13:30:44 CEST 2009
Graham Dumpleton schrieb:
> I believe it does matter and that it contains ASCII possibly doesn't
> mean it is somehow simpler. The reason is that URL reconstruction
> recipe as per WSGI PEP has to work. Ie.
That of course will not work and is not something we should aim for.
There is a lot of stuff that will break as well, and libraries are
supposed to fix that on the 2.x -> 3.x transition. Actually in 2.6 you
can use bytestring literals that will fix that problem for you. The
only problem left is wsgi.url_scheme and for that one just have to use
an explicit .encode() call. No big deal.
> This is also why all the other CGI variables are similarly make to be
> unicode strings. That is, so all the same type and stuff like URL
> reconstruction will work.
In an ideal world, maybe. But the only thing more evil than
UnicodeErrors are silent encoding errors that are hard to track down.
(What just destroyed my charset information? Oh, it was the WSGI gateway
in combination with an ancient internet explorer version)
> If bytes is used, you could potentially end up with messy situations
> where you have to perform URL reconstruction as bytes, but then
> convert it to unicode strings to stuff it in as a parameter into some
> templating system where the template text is unicode.
URLs are ASCII only, IRIs are not. If you are working with Python 3 you
would probably start using IRIs internally after a while because "it
> If SCRIPT_NAME, PATH_INFO and QUERY_STRING are in bytes form and they
> needed different encodings, how do you easily convert your bytes
> strings to the unicode string needed to stuff in the template. Can't
> see how you could, they really need to be in unicode if everything
> else in the system is going to be unicode. Or are templating systems
> now going to be expected to drop down and use bytes all the time as
I still defend my point that charsets are a complex topic and it's the
framework / library that should deal with that. WebOb does, Werkzeug
does, Django does, I'm sure web.py and other libraries do to. If one
wants to shoot himself into the foot by implementing his own library
based on WSGI we should not stop him.
> As a way of deterring a lot of users, making it harder to use, or at
> least making it more obvious that thought is required, would be quite
> This would also be good in pushing people to use existing
> frameworks/toolkits which deal with all this stuff internally and hide
> it and instead present unicode strings at a higher level after doing
> everything correctly.
I like that idea a lot :)
> As others have pointed out, the likes of rack and jack, not sure about
> the new Perl variant, don't seem to have an issue with using unicode.
Ruby does not use unicode internally, it uses encoding marked strings.
That is, a string comes in and is iso-8859-15, it's marked as such and
ruby knows how to deal with it. As far as I know Rack does not specify
charsets at all which probably means that it's up to the implementaiton
to decide what to use. Rack will have the problem with charsets soon
enough, they just don't care about unicode enough (yet?).
> I am still waiting for the good explanation of why access to the raw
> URL quoted values is so important. Can you please explain what the
> requirement is?
Knowing the difference between "foo/bar" and "foo%2fbar" I guess. To be
humble, I never had the problem, but apparently some other people are.
And of course that you suddenly have non ASCII stuff in a dict value ;)
> The only example I recall was related to web servers eliminating
> repeating slashes thereby effectively not making it possible to have
> URLs in query strings with out a custom encoding string. Since there
> are alternatives, I don't find that alone a compelling argument.
I don't need unquoted strings, I just think it would make sense to have
them *if possible*.
More information about the Web-SIG