[Web-SIG] WSGI for Python 3

P.J. Eby pje at telecommunity.com
Sat Jul 17 01:35:21 CEST 2010


At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
>On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby 
><<mailto:pje at telecommunity.com>pje at telecommunity.com> wrote:
>At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
>And this doesn't help with Python 3: either we have byte values of 
>SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I 
>think bytes will be more awkward to port to than text, and 
>inconsistent with other WSGI values.
>
>
>OTOH, it has the tremendous advantage of pushing the encoding 
>question onto the app (or framework) developer... Â who's really the 
>only one who can make the right decision for their particular 
>application. Â And personally, I'd rather have clear boundaries 
>between text and bytes, such that porting (even if tedious or 
>awkward) is *consistent*, and clear as to when you're finished, not, 
>"oh, did I check to make sure I converted SCRIPT_NAME and 
>PATH_INFO... Â not just in my app code, but in all the library code 
>I call *from* my app?"
>
>IOW, the bytes/string discussion on Python-dev has kind of led me to 
>realize that we might just as well make the *entire* stack bytes 
>(incoming and outgoing headers *and* streams), and rewrite that bit 
>in PEP 333 about using str on "Python 3000" to say we go with bytes 
>on Python 3+ for everything that's a str in today's WSGI.
>
>
>This was my first intuition too, until I started thinking in more 
>detail about the particular values involved.  Some obviously are 
>textish, like environ['SERVER_NAME'].  Not a very useful value, but 
>definitely text.
>
>Basically all the internal strings are textish, so we're left with:
>
>wsgi.url_scheme
>SCRIPT_NAME/PATH_INFO
>QUERY_STRING
>HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
>response status
>response headers (name and value)

What I'm getting at, though, is it's precisely this sort of "hm, 
which ones are bytes again?" stuff that makes you have to stop and 
*think*, i.e., it doesn't Fit My Brain<tm> any more.  ;-)

There should be one, and preferably *only* one, obvious way to do it.

And given that HTTP is inherently a bunch of bytes, bytes is the one 
obvious way.

I previously was under the impression that bytes wouldn't 
interoperate with strings in 3.x, but they *do*, in much the same way 
as they did in 2.x.  That means you'll be (mostly) bug-compatible in 
3.x, only you'll likely encounter encoding issues *sooner*, rather 
than later.  (i.e., the minute you combine non-ASCII inputs with your 
regular string constants).

Yes, you will also be forced to convert your return values to bytes, 
but if you've used string constants *anywhere*, then you know you'll 
be outputting text, which you should already have been encoding for 
output.  (So you'll just be forced to deal with errors on that side 
sooner as well.)

All in all, I'd say this also fits with what people on Python-Dev 
keep hammering on as the One Obvious Way to deal with bytes and 
strings in a program: i.e., bytes for I/O, text for text processing.

WSGI is HTTP, and HTTP is I/O, ergo, WSGI is I/O, and we should 
therefore "byte" the bullet here.  ;-)



More information about the Web-SIG mailing list