[XML-SIG] Re: Issues with Unicode type

Eric van der Vlist vdv@dyomedea.com
23 Sep 2002 19:06:16 +0200


On Mon, 2002-09-23 at 18:55, Martin v. Loewis wrote:
> Eric van der Vlist <vdv@dyomedea.com> writes:
>=20
> > > By default Python is using UTF-16 as its Unicode encoding. The
> > > code-point that you specify, U+10800, is outside the BMP and hence is
> > > represented by two surrogate characters in UTF-16.
> >=20
> > Arg! Does that mean that by default Python isn't strictly conform to XM=
L
> > 1.0?
>=20
> No. Why do you think this?

I would say that since a XML document is defined as set of unicode
characters, a single character "&x10800;" is not the same thing as a
sequence of two characters. The content of my element
<doc>&#67584;</doc> doesn't seem to be correctly represented as a string
of two characters like it is when I parse the document! Or have I missed
something?

Eric (meaning no offense!)

--=20
Rendez-vous =E0 Paris.
                          http://www.technoforum.fr/integ2002/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------