<br><br><div class="gmail_quote">On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <span dir="ltr"><<a href="mailto:stefan_ml@behnel.de">stefan_ml@behnel.de</a>></span> wrote:<br><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Yes, Unicode was specifically designed to support that. The first 128 code points are identical with the ASCII encoding, the first 256 code points are identical with the Latin-1 encoding.<br>
<br>
See also PEP 393, which exploits this feature.<br>
<br>
<a href="http://www.python.org/dev/peps/pep-0393/" target="_blank">http://www.python.org/dev/peps/pep-0393/</a><br>
<br>
That being said, I don't see the point of aliasing "latin-1" to "bytes" in the codecs. That sounds confusing to me. </blockquote></div><br><div>"bytes" is probably the wrong name for it, but I think using some name to signal "I'm not really using this encoding, I just need to be able to pass these bytes into and out of a string without losing any bits" might be better than using "latin-1" if we're forced to take up this hack. (My gut feeling is that it would be better if we could avoid using the "latin-1" hack all together, but apparently wiser minds than me have decided we have no other choice.) Maybe we could call it "passthrough"? And we could add a documentation note that if you use "passthrough" to decode some bytes you must, must, must use it to encode them later, since the string you manipulate won't really contain unicode codepoints, just a transparent byte encoding…</div>
<div><br></div><div>-- Carl</div>