[Web-SIG] Request for Comments on upcoming WSGI Changes
Chris McDonough
chrism at plope.com
Mon Sep 21 09:10:32 CEST 2009
OK, after some consideration, I think I'm sold.
Answering my own original question about why unicode seems to make sense as
values in the WSGI environment even without consideration for Python 3
compatibility: *something* needs to do this translation. Currently I
personally rely on WebOb to do a lot of this translation. I can't think of a
good reason that implementations at the level of WebOb would each need to do
this translation work; pushing the job into WSGI itself seems to make sense
here. This is particularly true for PATH_INFO and QUERY_STRING; these days
it's foolish to assume these values will be entirely composed of "low order"
characters, and thus being able to access them as bytes natively isn't very useful.
OTOH, I suspect the Python 3 stdlib is still broken if it requires native
strings in various places (and prohibits the use of bytes).
James Bennett wrote:
> On Sun, Sep 20, 2009 at 11:25 PM, Chris McDonough <chrism at plope.com> wrote:
>> WSGI is a fairly low-level protocol aimed at folks who need to interface a
>> server to the outside world. The outside world (by its nature) talks bytes.
>> I fear that any implied conversion of environment values and iterable
>> return values to Unicode will actually eventually make things harder than
>> they are now. I realize that it would make middleware implementors lives
>> harder to need to deal in bytes. However, at this point, I also believe
>> that middleware kinda should be hard. We have way too much middleware that
>> shouldn't be middleware these days (some written by myself).
>
> Well, ordinarily I'd be inclined to agree: HTTP deals in bytes, so an
> interface to HTTP should deal in bytes as well.
>
> The problem, really is that despite being a very low-level interface,
> WSGI has a tendency to leak up into much higher-level code, and (IMO)
> authors of that high-level code really shouldn't have to waste their
> time dealing with details of the underlying low-level gateway.
>
> You've said you don't want to hear "Python 3" as the reason, but it
> provides some useful examples: in high-level code you'll commonly want
> to be doing things like, say, comparing parts of the requested URL
> path to known strings or patterns. And that high-level code will
> almost certainly use strings, while WSGI, in theory, will be using
> bytes. That's just a recipe for disaster; if WSGI mandates bytes, then
> bytes will have to start "infecting" much higher-level code (since
> Python 3 -- rightly -- doesn't let you be nearly as promiscuous about
> mixing bytes and strings).
>
> Once I'm at a point where I can use Python 3, I know I'll personally
> be looking for some library which will normalize everything for me
> before I interact with it, precisely to avoid this sort of leakage; if
> WSGI itself would at least *allow* that normalization to happen at the
> low level (mandating it is another discussion entirely) I'd feel much
> happier about it going forward.
>
>
More information about the Web-SIG
mailing list