[Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

Thu Sep 23 18:17:32 CEST 2010

On Thu, Sep 23, 2010 at 11:06 AM, P.J. Eby <pje at telecommunity.com> wrote:

> At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote:
>
>> On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby <<mailto:pje at telecommunity.com
>> >pje at telecommunity.com> wrote:
>> The Python 3 specific changes are to use:
>>
>> * ``bytes`` for I/O streams in both directions
>> * ``str`` for environ keys and values
>> * ``bytes`` for arguments to start_response() and write()
>>
>>
>> This is the only thing that seems odd to me -- it seems like the response
>> should be symmetric with the request, and the request in this case uses str
>> for headers (status being header-like), and bytes for the body.
>>
>
> So, I've given some thought to your suggestion, and, while it's true that
> most of the output headers are far less prone to ending up with unintended
> unicode content, there are at least two output headers that can include some
> sort of application content (and can therefore have random failures):
> Location and Set-Cookie.
>
> If these headers accidentally contain non-Latin1 characters, the error
> isn't detectable until the header reaches the origin server doing the
> transmission encoding, and it'll likely be a dynamic (and therefore
> hard-to-debug) error.
>

I don't see any reason why Location shouldn't be ASCII.  Any header could
have any character put in it, of course, there's just no valid case where
Location shouldn't be a URL, and URLs are ASCII.  Cookie can contain
weirdness, yes.  I would expect any library that abstracts cookies to handle
this (it's certainly doable)... otherwise, this seems like one among many
ways a person can do the wrong thing.

This can also be detected with the validator, which doesn't avoid runtime
errors, but bytes allow runtime errors too -- they will just happen
somewhere else (e.g., when a value is converted to bytes in an application
or library).

If servers print the invalid value on error (instead of just some generic
error) I don't think it would be that hard to track down problems.  This
requires some explicit effort on the part of the server (most servers handle
app_iter==None ungracefully, which is a similar problem).

However, if the output is always bytes (and this can be
> relatively-statically verified), then any error can't occur except *inside*
> the application, where the app's developer can find it more easily.
>
> So I guess the question boils down to: would we rather make sure that
> coding errors happen *inside* applications, or would we rather make porting
> WSGI apps trivial (or nearly so)?
>
> But I think that it's possible here to have one's cake and eat it too: if
> we require bytes for all outputs, but provide a pair of decorators in
> wsgiref.util like the following:
>
>    def encode_body(codec='utf8'):
>        """Allow a WSGI app to output its response body as strings
> w/specified encoding"""
>        def decorate(app):
>            def encode(response):
>                try:
>                    for data in response:
>                        yield data.encode(codec)
>                finally:
>                    if hasattr(response, 'close'):
>                        response.close()
>            def decorated_app(environ, start_response):
>                def start(status, response_headers, exc_info=None):
>                    _write = start_response(status, response_headers,
> exc_info)
>                    def write(data):
>                        return _write(data.encode(codec))
>                    return write
>                return encode(app(environ, start))
>            return decorated_app
>        return decorate
>
>    def encode_headers(codec='latin1'):
>        """Allow a WSGI app to output its headers as strings, w/specified
> encoding"""
>        def decorate(app):
>            def decorated_app(environ, start_response):
>                def start(status, response_headers, exc_info=None):
>                    status = status.encode(codec)
>                    response_headers = [
>                        (k.encode(codec), v.encode(codec)) for k,v in
> response_headers
>                    ]
>                    return start_response(status, response_headers,
> exc_info)
>                return app(environ, start)
>            return decorated_app
>        return decorate
>
> So, this seems like a win-win to me: relatively-static verification, errors
> stay in the app (or at least in the decorator), and the API is
> clean-and-easy.  Indeed, it seems likely that at least some apps that don't
> read wsgi.input themselves could be ported *just* by adding the appropriate
> decorator(s).  And, if your app is using unicode on 2.x, you can even use
> the same decorators there, for the benefit of 2to3.  (Assuming I release an
> updated standalone wsgiref version with the decorators, of course.)
>

This doesn't seem that different than the validator, except that the
decorator uses a different interface internally and externally (the internal
interface using text, the external one bytes).

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20100923/31a131e0/attachment.html>