[Python-Dev] len(chr(i)) = 2?

Mon Nov 22 09:20:59 CET 2010

> Unicode 5.0, Chapter 3, verse C9:
> 
>     When a process generates a code unit sequence which purports to be
>     in a Unicode character encoding form, it shall not emit ill-formed
>     code sequences.

>  > > A Unicode-conforming Python implementation would error at the
>  > > chr() call, or perhaps would not provide surrogateescape error
>  > > handlers.
>  > 
>  > Chapter and verse?
> 
> Chapter 3, verse C9 again.

I agree that the surrogateescape error handler is non-conforming, but,
as you say, it doesn't claim to, either (would your concern about utf-8
being misleading here been resolved if the thing had been called
"utf-8b"?)

More interestingly (and to the subject) is chr: how did you arrive
at C9 banning Python3's definition of chr? This chr function puts
the code sequence into well-formed UTF-16; that's the whole point of
UTF-16.

Regards,
Martin