[Python-ideas] Processing surrogates in

Wed May 13 19:45:15 CEST 2015

random832 at fastmail.us writes:

 > If you're using libc, why shouldn't you be using the native wide
 > character types (whether that it UTF-16 or UCS-4) and using the wide
 > string APIs?

Who says you are using libc?  You might be writing an operating system
or a shell script.  And if you do use the native wide character type,
you're guaranteed not to be portable because some systems have wide
characters are actually variable width and others aren't, as you just
pointed out.  Or you might have an ancient byte-oriented program you
want to use.

I'm not saying that UTF-8 is a panacea; just that every problem that
UTF-8 has, UTF-16 also has -- but UTF-16 does have problems that UTF-8
doesn't.  Specifically, surrogates and ASCII incompatibility.