writing Unicode objects to XML

Steven Taschuk staschuk at telusplanet.net
Mon May 5 19:52:46 EDT 2003


Quoth Martin v. Löwis:
> Steven Taschuk <staschuk at telusplanet.net> writes:
  [whether it is possible "in XML" to specify '&#x61;' instead of 'a',
   and to distinguish same]
> > A nit: whether this is true is a property of one's XML tools, not
> > a property of XML itself.  It is easy to imagine XML writers with
> > all sorts of policies about character encoding.  (See below.)
> 
> Well, no. There is a notion of the "XML Information Set", see
  [...]
> The information of a character information item does *not* indicate
> whether the character was encoding in its source encoding, or using as
> a character reference. "Not being part of the XML infoset" is really
> the same thing as "no way in XML".

Our disagreement is in this last sentence.  XML is not just the
infoset; it is also a syntax by which the information in the
infoset is (de)serialized.  And at that level, there is indeed a
way to specify and distinguish numeric entity references and
literal characters.  For example, I can and sometimes do write XML
in vi, and specify what I want directly.

The infoset is an abstraction layer; but XML is octets too.

-- 
Steven Taschuk                  staschuk at telusplanet.net
"Telekinesis would be worth patenting."  -- James Gleick





More information about the Python-list mailing list