<div class="gmail_quote">On Wed, Apr 29, 2009 at 23:03, Terry Reedy <span dir="ltr">&lt;<a href="mailto:tjreedy@udel.edu" target="_blank">tjreedy@udel.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div>Thomas Breuel wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

    Sure. However, that requires you to provide meaningful, reproducible<br>

    counter-examples, rather than a stenographic formulation that might<br>

    hint some problem you apparently see (which I believe is just not<br>

    there).<br>

<br>

<br>

Well, here&#39;s another one: PEP 383 would disallow UTF-8 encodings of half surrogates. <br>

</blockquote>

<br></div>

By my reading, the current Unicode 5.1 definition of &#39;UTF-8&#39; disallows that.</blockquote><div><br>If we use conformance to Unicode 5.1 as the basis for our discussion, then PEP 383 is off the table anyway.  I&#39;m all for strict Unicode compliance.  But apparently, the Python community doesn&#39;t care.<br>


<br>CESU-8 is described in Unicode Technical Report #26, so it at least has some official recognition.  More importantly, it&#39;s also widely used.  So, my question: what are the implications of PEP 383 for CESU-8 encodings on Python?<br>


<br>My meta-point is: there are probably many more such issues hidden away and it is a really bad idea to rush something like PEP 383 out.  Unicode is hard anyway, and tinkering with its semantics requires a lot of thought.<br>


<br>Tom<br><br></div></div><br>