[I18n-sig] Re: [XML-SIG] Character encodings and expat
Tue, 31 Oct 2000 21:25:45 +0100
"Martin v. Loewis" wrote:
> > The implementation does use wchar_t where available and usable
> > (meaning that sizeof(wchar_t) == 2).
> There is probably not much point in rehashing the entire discussion,
> but I'd think that wchar_t is usable also more cases; specifically on
> Linux, where it is defined to hold ISO 10646 characters.
You probably mean: UCS-4...
> that the elements of a Unicode string have only two bytes will cause
> problems in the long run, IMHO, since it will lead the way to UTF-16,
> which is utter non-sense.
We can always move on to UCS-4 at some later point. Right now,
Python's Unicode internals are defined to be UTF-16 without
support for surrogates... which means UCS-2 in most cases.
BTW, there are conversions C API available to directly interface
to the C libs native wchar_t type. The APIs also have optimizations
to only copy data in case the sizeof() values differ.
Python Pages: http://www.lemburg.com/python/