[XML-SIG] CDATA sections still not handled
matt
matt@virtualspectator.com
Fri, 19 Jan 2001 01:20:52 +1300
On Thu, 18 Jan 2001, Ken MacLeod wrote:
> Matt,
>
> If I understand this thread correctly, it's the common "how do I pass
> XML inside XML" question.
sort of ... but that will answer it too.
>
> CDATA sections are not relevant to this question. These two XML
> fragments are equivalent for all practical purposes:
>
> <my-tag><[CDATA[Some <tags> & &entities; inside XML]]></my-tag>
>
> <my-tag>Some <tags> &amp; &entities; inside XML</my-tag>
>
> In both cases your application will see:
>
> startElement() with element name 'my-tag'
> characters() with data "Some <tags> & &entities; inside XML"
> endElement() with element name 'my-tag'
>
Yes, yes, that is what I have been trying to say. CDATA just lets it remain
human readable in the original document. But once through a DOM implementation
and all that is gone, you get the second option back out. Which is fine w.r.t
parsing down the line, but not much fun when perusing modified documents.
>
> That the data "is" XML is also not relevant to this question, it could
> be any type of data that contains markup characters.
Yes, I also include program fragments sometimes ..... so that's another good
example.
>
> If you want to "do something with the XML" inside the XML, the easiest
> way is to use another instance of a parser to parse the string as XML.
>
Yep, I mentioned that in about my second email, that some "other" process will
be the thing that reads this data and "possibly" validating it if it indeed
needs to.
> If you are interested in preserving the fact that the original file
> used a CDATA section to escape the markup, instead of entities to
> escape the markup, I believe SAX2 does provide that information, but
> you need to evaluate whether or not that really does what you want.
> Besides downplaying CDATA sections, a SAX parser is going to normalize
> a lot of other characters from the original file before it passes it
> to you, in such a way that you really can't reproduce the original
> file.
Yes, I found that both fortunate and unfortunate. I now see that if I want my
data to remain clean in the sense I can still look at it an read it with some
ease, then I need to write my own reverse-translation method and then rewrap
those text data nodes with CDATA tags again, and save that document.
>
> Does that help?
Yes, very much so, it means I WAS on the right track, and that it IS normal to
want to put xml or xml like data within an xml document and not have it parsed
for well-formedness. Maybe I am a rare exception where my translated CDATA,
i.e. in 'entity references' just looks such a nightmare to read through.
Keeping the original characters speeds debugging of contained data immensely.
>
> -- Ken
>
thanks
regards
Matt
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--