[Web-SIG] Request for Comments on upcoming WSGI Changes

Tue Sep 22 08:43:23 CEST 2009

It's not a specific proposal, but here's my opinions on what a proposal
should be:

On Tue, Sep 22, 2009 at 1:06 AM, Mark Nottingham <mnot at mnot.net> wrote:

> OK, that's quite exhaustive.
>
> For the benefit of those of us jumping in, could you summarise your
> proposal in something like the following manner:
>
> 1. How the request method is made available to WSGI applications
>

Graham talked about it as bytes/unicode/native, where native is unicode on
Python 3 and str on Python 2.  For instance, I think there's general
consensus (though not really specifically discussed) that environ keys
should be native.

I think method should be native.

> 2. How the request-uri is made available to WSGI applications -- in
> particular, whether any decoding of punycode and/or %-escapes happens
>

Hah, didn't even think about de-punycoding HTTP_HOST.  That'd be a blast.

I think:
* scheme as native
* HTTP_HOST as native (no decoding of punycode)
* path as native (no URL decoding) - big break with WSGI 1 and CGI, but what
the hell.  I could easily waffle on this.
* query string as native - *should* be ASCII-safe currently.

Wow, that was easy!

Request headers, which you didn't split out... those I'm not sure.  I'd
*like* them to be native.  But damn, I'm just not sure quite how.
surrogateescape?  Latin1?  Latin1 as a kind of poor man's surrogateescape
isn't so bad.  And the headers *should* be ASCII for sane requests, so it's
not a horrible compromise.  I guess libraries could lazilly transcode, just
like they currently lazily decode.  But it'd be a bit obnoxious at the
library level.  Transcoding middleware would be easier, but it adds the
question of how to record that the transcoding has taken place.

> 3. How request headers are made available to WSGI apps
>

Request handlers?  I don't understand your terminology.

> 4. How the request body is made available to to WSGI apps
>

Ugh.  wsgi.input could remain.  I think at least it should become a
file-like interface (i.e., giving an empty string when the content is
exausted) and I might even ask that it implement .tell() (.seek() would be
nice of course, but optional).  If there was some other idea, I think
there's room for improvement on wsgi.input and the file interface.

wsgi.input should definitely work with bytes only.  I believe this is
consensus.

> 5. Likewise for how apps should expose the response status message, headers
> and body to WSGI implementations.
>

I believe there is consensus that the response body should remain an
iterator that yields bytes.

In one way, it'd be nice if we'd just say that status/headers should be
ASCII, because that's the reasonable choice.  But for proxying or
representing "HTTP as it is", it's not always the case.  And I'm committed
to keeping WSGI fully capable of representing arbitrary requests and
responses so long as they aren't entirely diabololical.

But, an ASCII status is not unreasonable, especially since there's zero
semantic meaning to the reason.  Which makes native strings perfectly fine.

So, headers...

Well, Latin1 is easy enough.  In theory, or at least particular theories,
headers can be Latin1.  And you can represent arbitrary bytes that way.  So
if you want to send crazy stuff to the browser, you can do it that way.  And
if you want to stick to plain ASCII then that's easy enough as well.  So...
native?  str or unicode?  I'm not sure specifically for this one.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090922/50b6ce1d/attachment-0001.htm>