[Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)

Thu Sep 23 22:23:02 CEST 2010

At 11:17 AM 9/23/2010 -0500, Ian Bicking wrote:
>I don't see any reason why Location shouldn't be ASCII.Â  Any header 
>could have any character put in it, of course, there's just no valid 
>case where Location shouldn't be a URL, and URLs are ASCII.Â  Cookie 
>can contain weirdness, yes.Â  I would expect any library that 
>abstracts cookies to handle this (it's certainly doable)... 
>otherwise, this seems like one among many ways a person can do the wrong thing.
>
>This can also be detected with the validator, which doesn't avoid 
>runtime errors, but bytes allow runtime errors too -- they will just 
>happen somewhere else (e.g., when a value is converted to bytes in 
>an application or library).

Right: somewhere much closer to the *actual* error, where the 
developer can know the problem is, "I have garbage data or have not 
selected an appropriate codec", rather than "this WSGI stuff is 
giving me errors some place".

>If servers print the invalid value on error (instead of just some 
>generic error) I don't think it would be that hard to track down 
>problems.Â  This requires some explicit effort on the part of the 
>server (most servers handle app_iter==None ungracefully, which is a 
>similar problem).

The difference is that if a server rejects non-bytes, you'll know 
*right away* that your app isn't compliant, instead of having to wait 
until some non-latin1 data shows up.

AFAICT, there are only two advantages to using text for output headers:

1. Text is easier to work with, and
2. It's symmetric with using text for input headers.

Both of which can still be had, by using the @encode_headers decorator.

I'm a little bit on the fence on this one, because 1) it does seem a 
little pointless (if harmless) to shuffle headers around in bytes 
form, and 2) Location and Set-Cookie are very likely the only headers 
where any kind of damage could ever happen.

But, since it *can* happen, and because it is also really easy to fix 
the API issue with a decorator, I'm still leaning in favor of "output 
is bytes" over "headers are text, bodies are bytes", unless somebody 
can come up with either some actually-bad consequence of using bytes, 
or some extra-good consequence of using text (that isn't addressed by 
just using the decorator).

(Note, by the way, that WSGI design has always leaned in the 
direction of "any convenience that can be handled by a library should 
be", if it keeps the spec simpler and more verifiable.  So, this 
seems like a good use of that principle.)