On Thu, Sep 23, 2010 at 11:17 AM, Ian Bicking <span dir="ltr">&lt;<a href="mailto:ianb@colorstudy.com">ianb@colorstudy.com</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

If these headers accidentally contain non-Latin1 characters, the error isn&#39;t detectable until the header reaches the origin server doing the transmission encoding, and it&#39;ll likely be a dynamic (and therefore hard-to-debug) error.<br>


</blockquote></div><div><br>I don&#39;t see any reason why Location shouldn&#39;t be ASCII.  Any header could have any character put in it, of course, there&#39;s just no valid case where Location shouldn&#39;t be a URL, and URLs are ASCII.  Cookie can contain weirdness, yes.  I would expect any library that abstracts cookies to handle this (it&#39;s certainly doable)... otherwise, this seems like one among many ways a person can do the wrong thing.<br>


</div></div></blockquote><div><br>Minor correction, Set-Cookie, not Cookie.  Good practice is to stick to ASCII even there (all other techniques have a high risk of mojibake), so we&#39;re really considering legacy integration.  Note that a similar problem is using [(&#39;Content-length&#39;, len(body))] -- which also results in a sometimes confusing error message well away from the application itself.<br>


<br>Generally without validation any data errors occur away from the application.  A type error is not any different than an encoding error.  Using bytes removes a possible encoding error, but IMHO has a greater chance of type errors (as bytes are not as natural as text in most cases).  Validation can check all aspects, including encoding (simply by doing a test encoding).<br>


<br>Consider this hello world:<br><br>def app(environ, start_response):<br>    body = b&#39;Hello World&#39;<br>    start_response(b&#39;200 OK&#39;, [(b&#39;Content-Type&#39;, str(len(body)).encode(&#39;ascii&#39;))])<br>


    return [body]<br><br>str(len(body)).encode(&#39;ascii&#39;)?!?  Yuck.  Also no 2to3 fixup can help there.  bytes(len(body)) does something weird.<br><br></div></div>-- <br>Ian Bicking  |  <a href="http://blog.ianbicking.org">http://blog.ianbicking.org</a><br>