[Web-SIG] Request for Comments on upcoming WSGI Changes

Chris McDonough chrism at plope.com
Mon Sep 21 09:10:32 CEST 2009


OK, after some consideration, I think I'm sold.

Answering my own original question about why unicode seems to make sense as 
values in the WSGI environment even without consideration for Python 3 
compatibility:  *something* needs to do this translation.  Currently I 
personally rely on WebOb to do a lot of this translation.  I can't think of a 
good reason that implementations at the level of WebOb would each need to do 
this translation work; pushing the job into WSGI itself seems to make sense 
here.  This is particularly true for PATH_INFO and QUERY_STRING; these days 
it's foolish to assume these values will be entirely composed of "low order" 
characters, and thus being able to access them as bytes natively isn't very useful.

OTOH, I suspect the Python 3 stdlib is still broken if it requires native 
strings in various places (and prohibits the use of bytes).

James Bennett wrote:
> On Sun, Sep 20, 2009 at 11:25 PM, Chris McDonough <chrism at plope.com> wrote:
>> WSGI is a fairly low-level protocol aimed at folks who need to interface a
>> server to the outside world.  The outside world (by its nature) talks bytes.
>>  I fear that any implied conversion of environment values and iterable
>> return values to Unicode will actually eventually make things harder than
>> they are now.  I realize that it would make middleware implementors lives
>> harder to need to deal in bytes.  However, at this point, I also believe
>> that middleware kinda should be hard.  We have way too much middleware that
>> shouldn't be middleware these days (some written by myself).
> 
> Well, ordinarily I'd be inclined to agree: HTTP deals in bytes, so an
> interface to HTTP should deal in bytes as well.
> 
> The problem, really is that despite being a very low-level interface,
> WSGI has a tendency to leak up into much higher-level code, and (IMO)
> authors of that high-level code really shouldn't have to waste their
> time dealing with details of the underlying low-level gateway.
> 
> You've said you don't want to hear "Python 3" as the reason, but it
> provides some useful examples: in high-level code you'll commonly want
> to be doing things like, say, comparing parts of the requested URL
> path to known strings or patterns. And that high-level code will
> almost certainly use strings, while WSGI, in theory, will be using
> bytes. That's just a recipe for disaster; if WSGI mandates bytes, then
> bytes will have to start "infecting" much higher-level code (since
> Python 3 -- rightly -- doesn't let you be nearly as promiscuous about
> mixing bytes and strings).
> 
> Once I'm at a point where I can use Python 3, I know I'll personally
> be looking for some library which will normalize everything for me
> before I interact with it, precisely to avoid this sort of leakage; if
> WSGI itself would at least *allow* that normalization to happen at the
> low level (mandating it is another discussion entirely) I'd feel much
> happier about it going forward.
> 
> 



More information about the Web-SIG mailing list