[Python-Dev] len(chr(i)) = 2?
Terry Reedy
tjreedy at udel.edu
Thu Nov 25 06:39:30 CET 2010
On 11/24/2010 3:06 PM, Alexander Belopolsky wrote:
> Any non-trivial text processing is likely to be broken in presence of
> surrogates. Producing them on input is just trading known issue for
> an unknown one. Processing surrogate pairs in python code is hard.
> Software that has to support non-BMP characters will most likely be
> written for a wide build and contain subtle bugs when run under a
> narrow build. Note that my latest proposal does not abolish
> surrogates outright. Users who want them can still use something like
> "surrogateescape" error handler for non-BMP characters.
It seems to me that what you are asking for is an alternate, optional,
utf-8-bmp codec that would raise an error, in either direction, for
non-bmp chars. Then, as you suggest, if one is not prepared for
surrogates, they are not allowed.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list