[XML-SIG] CDATA sections still not handled

matt matt@virtualspectator.com
Fri, 19 Jan 2001 01:20:52 +1300


On Thu, 18 Jan 2001, Ken MacLeod wrote:
> Matt,
> 
> If I understand this thread correctly, it's the common "how do I pass
> XML inside XML" question.

sort of ... but that will answer it too.

> 
> CDATA sections are not relevant to this question.  These two XML
> fragments are equivalent for all practical purposes:
> 
>   <my-tag><[CDATA[Some <tags> &amp; &entities; inside XML]]></my-tag>
> 
>   <my-tag>Some &lt;tags> &amp;amp; &amp;entities; inside XML</my-tag>
> 
> In both cases your application will see:
> 
>   startElement()  with element name 'my-tag'
>   characters()    with data "Some <tags> &amp; &entities; inside XML"
>   endElement()    with element name 'my-tag'
> 

Yes, yes, that is what I have been trying to say.  CDATA just lets it remain
human readable in the original document.  But once through a DOM implementation
and all that is gone, you get the second option back out.  Which is fine w.r.t
parsing down the line, but not much fun when perusing modified documents.


> 
> That the data "is" XML is also not relevant to this question, it could
> be any type of data that contains markup characters.

Yes, I also include program fragments sometimes ..... so that's another good
example.

> 
> If you want to "do something with the XML" inside the XML, the easiest
> way is to use another instance of a parser to parse the string as XML.
> 

Yep, I mentioned that in about my second email, that some "other" process will
be the thing that reads this data and "possibly" validating it if it indeed
needs to.


> If you are interested in preserving the fact that the original file
> used a CDATA section to escape the markup, instead of entities to
> escape the markup, I believe SAX2 does provide that information, but
> you need to evaluate whether or not that really does what you want.
> Besides downplaying CDATA sections, a SAX parser is going to normalize
> a lot of other characters from the original file before it passes it
> to you, in such a way that you really can't reproduce the original
> file.

Yes, I found that both fortunate and unfortunate.  I now see that if I want my
data to remain clean in the sense I can still look at it an read it with some
ease, then I need to write my own reverse-translation method and then rewrap
those text data nodes with CDATA tags again, and save that document.


> 
> Does that help?

Yes, very much so, it means I WAS on the right track, and that it IS normal to
want to put xml or xml like data within an xml document and not have it parsed
for well-formedness.  Maybe I am a rare exception where my translated CDATA,
i.e. in 'entity references' just looks such a nightmare to read through. 
Keeping the original characters speeds debugging of contained data immensely.


> 
>   -- Ken
> 

thanks
regards
Matt


> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--