<div class="gmail_quote">On Wed, Apr 29, 2009 at 23:03, Terry Reedy <span dir="ltr"><<a href="mailto:tjreedy@udel.edu" target="_blank">tjreedy@udel.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>Thomas Breuel wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Sure. However, that requires you to provide meaningful, reproducible<br>
counter-examples, rather than a stenographic formulation that might<br>
hint some problem you apparently see (which I believe is just not<br>
there).<br>
<br>
<br>
Well, here's another one: PEP 383 would disallow UTF-8 encodings of half surrogates. <br>
</blockquote>
<br></div>
By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows that.</blockquote><div><br>If we use conformance to Unicode 5.1 as the basis for our discussion, then PEP 383 is off the table anyway. I'm all for strict Unicode compliance. But apparently, the Python community doesn't care.<br>
<br>CESU-8 is described in Unicode Technical Report #26, so it at least has some official recognition. More importantly, it's also widely used. So, my question: what are the implications of PEP 383 for CESU-8 encodings on Python?<br>
<br>My meta-point is: there are probably many more such issues hidden away and it is a really bad idea to rush something like PEP 383 out. Unicode is hard anyway, and tinkering with its semantics requires a lot of thought.<br>
<br>Tom<br><br></div></div><br>