[XML-SIG] Re: Issues with Unicode type

Martin v. Loewis martin@v.loewis.de
25 Sep 2002 20:01:36 +0200


Lars Marius Garshol <larsga@garshol.priv.no> writes:

> Note also that there is one further problem. How long is this string
> 
>   u"\u0041\u030A"
> 
> according to RELAX/XPath/XSDL?

In XML 1.1, you are required to produce NFC "early", i.e. before the
XML document becomes visible. XPath points out that things may work
incorrectly unless the W3C charmod canonical form is used. This is not
only relevant for length operations, but also for string comparison.

Relax does not bother mentioning normalization.

XSDL seems to be largely ignorant of normalization as well, although
it refers to the character model as a non-normative reference.

> The problem here is that the UTF-16 == Unicode assumption is built
> into all sorts of technologies, from Python to Java to Ada-95 to Win32
> to DOM 2.0 to ..., and in most cases people are not even aware of the
> problem. 

Notice that was even in Unicode a problem for a long time. Some
revisions of the Unicode spec ruled out wchar_t implementations as
non-conforming which use UCS-4 in memory.

Regards,
Martin