[I18n-sig] Re: [XML-SIG] Character encodings and expat

M.-A. Lemburg mal@lemburg.com
Tue, 31 Oct 2000 21:25:45 +0100

"Martin v. Loewis" wrote:
> > The implementation does use wchar_t where available and usable
> > (meaning that sizeof(wchar_t) == 2).
> There is probably not much point in rehashing the entire discussion,
> but I'd think that wchar_t is usable also more cases; specifically on
> Linux, where it is defined to hold ISO 10646 characters.

You probably mean: UCS-4...

> Requiring
> that the elements of a Unicode string have only two bytes will cause
> problems in the long run, IMHO, since it will lead the way to UTF-16,
> which is utter non-sense.

We can always move on to UCS-4 at some later point. Right now,
Python's Unicode internals are defined to be UTF-16 without
support for surrogates... which means UCS-2 in most cases.

BTW, there are conversions C API available to directly interface
to the C libs native wchar_t type. The APIs also have optimizations
to only copy data in case the sizeof() values differ.

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/