writing Unicode objects to XML
Martin v. Löwis
martin at v.loewis.de
Mon May 5 17:40:55 EDT 2003
Steven Taschuk <staschuk at telusplanet.net> writes:
> > There is no way, in XML, to specify which characters will be encoded in the
> > native encoding (e.g. '\xc3\xa8' in utf-8 in this case) and which ones will
> > be encoded using character references instead.
>
> A nit: whether this is true is a property of one's XML tools, not
> a property of XML itself. It is easy to imagine XML writers with
> all sorts of policies about character encoding. (See below.)
Well, no. There is a notion of the "XML Information Set", see
http://www.w3.org/TR/xml-infoset/
In 2.6, the notion of a "Character Information Item" is introduced.
# There is a character information item for each data character that
# appears in the document, whether literally, as a character
# reference, or within a CDATA section.
The information of a character information item does *not* indicate
whether the character was encoding in its source encoding, or using as
a character reference. "Not being part of the XML infoset" is really
the same thing as "no way in XML".
Regards,
Martin
More information about the Python-list
mailing list