[Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

P.J. Eby pje at telecommunity.com
Thu Sep 23 18:06:47 CEST 2010


At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote:
>On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby 
><<mailto:pje at telecommunity.com>pje at telecommunity.com> wrote:
>The Python 3 specific changes are to use:
>
>* ``bytes`` for I/O streams in both directions
>* ``str`` for environ keys and values
>* ``bytes`` for arguments to start_response() and write()
>
>
>This is the only thing that seems odd to me -- it seems like the 
>response should be symmetric with the request, and the request in 
>this case uses str for headers (status being header-like), and bytes 
>for the body.

So, I've given some thought to your suggestion, and, while it's true 
that most of the output headers are far less prone to ending up with 
unintended unicode content, there are at least two output headers 
that can include some sort of application content (and can therefore 
have random failures): Location and Set-Cookie.

If these headers accidentally contain non-Latin1 characters, the 
error isn't detectable until the header reaches the origin server 
doing the transmission encoding, and it'll likely be a dynamic (and 
therefore hard-to-debug) error.

However, if the output is always bytes (and this can be 
relatively-statically verified), then any error can't occur except 
*inside* the application, where the app's developer can find it more easily.

So I guess the question boils down to: would we rather make sure that 
coding errors happen *inside* applications, or would we rather make 
porting WSGI apps trivial (or nearly so)?

But I think that it's possible here to have one's cake and eat it 
too: if we require bytes for all outputs, but provide a pair of 
decorators in wsgiref.util like the following:

     def encode_body(codec='utf8'):
         """Allow a WSGI app to output its response body as strings 
w/specified encoding"""
         def decorate(app):
             def encode(response):
                 try:
                     for data in response:
                         yield data.encode(codec)
                 finally:
                     if hasattr(response, 'close'):
                         response.close()
             def decorated_app(environ, start_response):
                 def start(status, response_headers, exc_info=None):
                     _write = start_response(status, 
response_headers, exc_info)
                     def write(data):
                         return _write(data.encode(codec))
                     return write
                 return encode(app(environ, start))
             return decorated_app
         return decorate

     def encode_headers(codec='latin1'):
         """Allow a WSGI app to output its headers as strings, 
w/specified encoding"""
         def decorate(app):
             def decorated_app(environ, start_response):
                 def start(status, response_headers, exc_info=None):
                     status = status.encode(codec)
                     response_headers = [
                         (k.encode(codec), v.encode(codec)) for k,v 
in response_headers
                     ]
                     return start_response(status, response_headers, exc_info)
                 return app(environ, start)
             return decorated_app
         return decorate

So, this seems like a win-win to me: relatively-static verification, 
errors stay in the app (or at least in the decorator), and the API is 
clean-and-easy.  Indeed, it seems likely that at least some apps that 
don't read wsgi.input themselves could be ported *just* by adding the 
appropriate decorator(s).  And, if your app is using unicode on 2.x, 
you can even use the same decorators there, for the benefit of 
2to3.  (Assuming I release an updated standalone wsgiref version with 
the decorators, of course.)

So, unless somebody has some additional arguments on this one, I 
think I'm going to stick with bytes output.



More information about the Web-SIG mailing list