[XML-SIG] Handling of character entity references

Tim Diggins subscribed@red56.co.uk
Mon, 26 May 2003 15:24:44 +0100


It struck me that another way of looking at the reported problem is that =
if
one does want to include a **character sequence** like "é" (ie =
the
characters and not the entity they represent) in an XML element's =
content
and then to get that back when parsed, well, then the correct solution =
is to
escape the & thus: "é" within the XML element.=20

What the original correspondent may have misunderstood is that "&" =
is a
representation of a single unicode character, and so is "é".
"& e a c u t e ;" as a series of 8 unicode characters is represented by
"é"

On the other hand everyone's commentary on the correct way to "design =
out"
the reported problem are correct - and most likely better solutions than =
the
above.

But considering editing applications (and other cases where one wants to
preserve the characters "& e a c u t e ;") - isn't it easier (better?) =
to
represent the characters (by escaping the magic &) than to use PIs or a
<char> entity?

just my 7.5p worth=20

best

Tim

---------------------------
  Tim Diggins
  mailto:tim@red56.co.uk
  http://www.red56.co.uk/people/tim


> -----Original Message-----
> From: xml-sig-admin@python.org=20
> [mailto:xml-sig-admin@python.org] On Behalf Of Paul Tremblay
> Sent: 26 May 2003 07:12
> To: xml-sig@python.org
> Subject: Re: [XML-SIG] Handling of character entity references
>=20
>=20
> On Sun, May 25, 2003 at 05:42:11PM -0600, Mike Brown wrote:
> > From: Mike Brown <mike@skew.org>
> > Subject: Re: [XML-SIG] Handling of character entity references
> > To: Tamito KAJIYAMA <kajiyama@grad.sccs.chukyo-u.ac.jp>
> > Cc: xml-sig@python.org
> > Date: Sun, 25 May 2003 17:42:11 -0600 (MDT)
> >=20
> > Tamito KAJIYAMA wrote:
> > > "Thomas B. Passin" <tpassin@comcast.net> writes:
> > > |
> > > | [<pyxml@wonderclown.com>]
> > > |=20
> > > | > I am trying to produce XHTML files from input XML=20
> files which contain
> > > | > a mixture of XHTML and custom markup.
> > > | >...
> > > | > I'm
> > > | > having a problem, though, getting character entity=20
> references in the
> > > | > source document to pass through to the output. Things=20
> like &amp;,
> > > | > &lt;, and &gt; work fine, but &eacute; does not.
> > > | >
> > > |=20