[XML-SIG] Re: Issues with Unicode type

Martin v. Loewis martin@v.loewis.de
25 Sep 2002 19:37:46 +0200


Lars Marius Garshol <larsga@garshol.priv.no> writes:

> | (BTW, there is no diff between UTF-32 and UCS-4, is there?). 
> 
> UTF-32 is Unicode, UCS-4 is ISO 10646. The Unicode code space used to
> be more restricted than the ISO 10646 one, which ISO was supposed to
> fix.  Not sure whether that fix has gone through yet, but probably it
> has. Once it has there will be no difference.

In addition, UTF-32 is a transfer form, UCS-4 is a code set. In some
revisions, ISO 10646 seems to imply that UTF-32 is thus a byte
encoding, but this has now been clarified that it is rather a transfer
form based on 32-bit code units, with UTF-32BE and UTF-32LE being
possible byte encodings.

Appart from that: every character assigned in UCS-4 has the code unit
with the same value in UTF-32.

Regards,
Martin