[Web-SIG] WSGI for Python 3

Sat Jul 17 04:43:26 CEST 2010

On Fri, 2010-07-16 at 20:46 -0500, Ian Bicking wrote:
> On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough <chrism at plope.com>
> wrote:
>         > What are the concrete problems you envision with text
>         request headers,
>         > text (URL-quoted) path, and text response status and
>         headers?
>         
>         
>         Documentation is the main reason.  For example, the
>         documentation for
>         making sense of path_info segments in a WSGI that used
>         unicodey-strings
>         would, as I understand it, read something like this:
> 
> Nah, not nearly that hard:
> 
> path_info =
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> 
> I don't see the problem?  If you want to distinguish %2f from /, then
> you'll do it slightly differently, like:
> 
> path_parts = [
>     urllib.parse.unquote_to_bytes(p).decode('UTF-8')
>     for p in environ['wsgi.raw_path_info'].split('/')]
>  
> This second recipe is impossible to do currently with WSGI.
> 
> So... before jumping to conclusions, what's the hard part with using
> text?

It's extremely hard to swallow Python 3's current disregard for the
primacy of bytes at I/O boundaries.  I'm trying, but I can't help but
feel that the existence of an API like "unquote_to_bytes" is more
symptom treatment than solution.  Of course something that unquotes a
URL segment unquotes it into bytes; it's the only sane default because
URL segments found in URLs on the internet are bytes.

So I guess the "hard part" is more meta.  When you have legitimate
backwards compatibility constraints, suboptimal choices made during
protocol design are excusable.  But it just seems really very weird to
design one (WSGI 2) from scratch with such choices when the only reason
to do so is a systematic low-level denial of reality.  Why would we use
(and, worse, by doing so, implicitly promote) such a system in the first
place?

On the other hand, indignance about the issue shouldn't rule the day
either.  To me, the most pragmatic thing to do that doesn't deny reality
would be to use bytes.  It's also the easiest thing to remember (the
values in the environment are all bytes) and I think we'll be able to
drive the Py3K stdlib forward in a much saner direction if we choose
bytes than if we choose text to represent things that are naturally more
bytes-like.

- C