On Sun, Jan 9, 2011 at 1:47 AM, Stephen J. Turnbull <span dir="ltr">&lt;<a href="mailto:stephen@xemacs.org">stephen@xemacs.org</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div class="im">Robert Brewer writes:<br>

<br>

 &gt; Python 3.1 was released June 27th, 2009. We&#39;re coming up faster on the<br>

 &gt; two-year period than we seem to be on a revised WSGI spec. Maybe we<br>

 &gt; should shoot for a &quot;bytes of a known encoding&quot; type first.<br>

<br>

</div>You have one.  It&#39;s called &quot;ISO 2022: Information processing -- ISO<br>

7-bit and 8-bit coded character sets -- Code extension techniques&quot;.<br>

The popularity of that standard speaks for itself.<br></blockquote><div><br>The kind of object PJE was referring to is more like Ruby&#39;s strings, which do not embed the encoding inside the bytes themselves but have the encoding as a kind of annotation on the bytes, and do lazy transcoding when combining strings of different encodings.  The goal with respect to WSGI is that you could annotate bytes with an encoding but also change or fix that encoding if other out-of-band information implied that you got the encoding wrong (e.g., some data is submitted with the encoding of the page the browser was on, and so nothing inside the request itself will indicate the encoding of the data).  Latin1 is kind of the poor man&#39;s version of this -- it&#39;s a good guess at an encoding, that at worst requires transcoding that can be done in a predictable way.  (Personally I think Latin1 gets us 99% of the way there, and so bytes-of-a-known-encoding are not really that important to the WSGI case.)<br>


<br>  Ian<br><br></div></div>