[Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3)
P.J. Eby
pje at telecommunity.com
Thu Sep 23 22:23:02 CEST 2010
At 11:17 AM 9/23/2010 -0500, Ian Bicking wrote:
>I don't see any reason why Location shouldn't be ASCII. Any header
>could have any character put in it, of course, there's just no valid
>case where Location shouldn't be a URL, and URLs are ASCII. Cookie
>can contain weirdness, yes. I would expect any library that
>abstracts cookies to handle this (it's certainly doable)...
>otherwise, this seems like one among many ways a person can do the wrong thing.
>
>This can also be detected with the validator, which doesn't avoid
>runtime errors, but bytes allow runtime errors too -- they will just
>happen somewhere else (e.g., when a value is converted to bytes in
>an application or library).
Right: somewhere much closer to the *actual* error, where the
developer can know the problem is, "I have garbage data or have not
selected an appropriate codec", rather than "this WSGI stuff is
giving me errors some place".
>If servers print the invalid value on error (instead of just some
>generic error) I don't think it would be that hard to track down
>problems. This requires some explicit effort on the part of the
>server (most servers handle app_iter==None ungracefully, which is a
>similar problem).
The difference is that if a server rejects non-bytes, you'll know
*right away* that your app isn't compliant, instead of having to wait
until some non-latin1 data shows up.
AFAICT, there are only two advantages to using text for output headers:
1. Text is easier to work with, and
2. It's symmetric with using text for input headers.
Both of which can still be had, by using the @encode_headers decorator.
I'm a little bit on the fence on this one, because 1) it does seem a
little pointless (if harmless) to shuffle headers around in bytes
form, and 2) Location and Set-Cookie are very likely the only headers
where any kind of damage could ever happen.
But, since it *can* happen, and because it is also really easy to fix
the API issue with a decorator, I'm still leaning in favor of "output
is bytes" over "headers are text, bodies are bytes", unless somebody
can come up with either some actually-bad consequence of using bytes,
or some extra-good consequence of using text (that isn't addressed by
just using the decorator).
(Note, by the way, that WSGI design has always leaned in the
direction of "any convenience that can be handled by a library should
be", if it keeps the spec simpler and more verifiable. So, this
seems like a good use of that principle.)
More information about the Web-SIG
mailing list