[Web-SIG] WSGI for Python 3
ianb at colorstudy.com
Sat Jul 17 06:38:13 CEST 2010
On Fri, Jul 16, 2010 at 9:43 PM, Chris McDonough <chrism at plope.com> wrote:
> > Nah, not nearly that hard:
> > path_info =
> > I don't see the problem? If you want to distinguish %2f from /, then
> > you'll do it slightly differently, like:
> > path_parts = [
> > urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> > for p in environ['wsgi.raw_path_info'].split('/')]
> > This second recipe is impossible to do currently with WSGI.
> > So... before jumping to conclusions, what's the hard part with using
> > text?
> It's extremely hard to swallow Python 3's current disregard for the
> primacy of bytes at I/O boundaries. I'm trying, but I can't help but
> feel that the existence of an API like "unquote_to_bytes" is more
> symptom treatment than solution. Of course something that unquotes a
> URL segment unquotes it into bytes; it's the only sane default because
> URL segments found in URLs on the internet are bytes.
Yes, URL quoted strings should decode to bytes, though arguably it is
reasonable to also use the very reasonable UTF-8 default that
urllib.parse.quote/unquote uses. So it's really just a question of names,
should be quote_to_string or quote_to_bytes that name. Which honestly...
So I guess the "hard part" is more meta. When you have legitimate
> backwards compatibility constraints, suboptimal choices made during
> protocol design are excusable. But it just seems really very weird to
> design one (WSGI 2) from scratch with such choices when the only reason
> to do so is a systematic low-level denial of reality. Why would we use
> (and, worse, by doing so, implicitly promote) such a system in the first
> On the other hand, indignance about the issue shouldn't rule the day
> either. To me, the most pragmatic thing to do that doesn't deny reality
> would be to use bytes. It's also the easiest thing to remember (the
> values in the environment are all bytes) and I think we'll be able to
> drive the Py3K stdlib forward in a much saner direction if we choose
> bytes than if we choose text to represent things that are naturally more
I do feel like indignance has played a part here. And in my brief forays
into Python 3 I have been frustrated by the over-textification of APIs.
But... if a compromise works let's not let those experiences color our
So, here's my criteria for resolving this particular Python 3 issue:
* We should not lose information from the request. Decoding with UTF-8
(without surrogateescape) would be an example. URL-decoding loses us
information currently; which is why I wouldn't be sad to see it go (though
if it was only for that reason I wouldn't bother -- the unicode issue just
makes it serendipitous).
* We shouldn't produce wildly inaccurate strings. E.g., decoding something
with Latin1 when it's an implausible encoding.
* Encoding/decoding errors should only possibly happen at the application
level, or maybe middleware if you are playing around with stuff. Servers
specifically should never have them (because they can't gracefully handle
* We should avoid server configuration with respect to application policy
(we've avoided it so far, yay!)
* We should support eclectic application layouts, e.g., an application that
sometimes serves Latin-1, sometimes UTF-8 (like if the application proxies
requests or serves up legacy content/apps).
* We should make things as easy to port as possible. Errors in porting
should be loud.
* As much as possible WSGI should be readable and usable. Maybe most people
will use a library, but we also have a lot of libraries that handle WSGI,
and it's nice that's been able to happen, so we don't want to make things
any harder than they have to be. E.g., clearly we should use text environ
keys (luckily we don't have to worry about non-ASCII header names, I guess?)
Ian Bicking | http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Web-SIG