[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Thu Apr 30 08:28:28 CEST 2009

On Wed, Apr 29, 2009 at 23:03, Terry Reedy <tjreedy at udel.edu> wrote:

> Thomas Breuel wrote:
>
>>
>>    Sure. However, that requires you to provide meaningful, reproducible
>>    counter-examples, rather than a stenographic formulation that might
>>    hint some problem you apparently see (which I believe is just not
>>    there).
>>
>>
>> Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
>> surrogates.
>>
>
> By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows
> that.

If we use conformance to Unicode 5.1 as the basis for our discussion, then
PEP 383 is off the table anyway.  I'm all for strict Unicode compliance.
But apparently, the Python community doesn't care.

CESU-8 is described in Unicode Technical Report #26, so it at least has some
official recognition.  More importantly, it's also widely used.  So, my
question: what are the implications of PEP 383 for CESU-8 encodings on
Python?

My meta-point is: there are probably many more such issues hidden away and
it is a really bad idea to rush something like PEP 383 out.  Unicode is hard
anyway, and tinkering with its semantics requires a lot of thought.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/754ce06b/attachment-0001.htm>