[XML-SIG] CDATA sections still not handled

Wed, 17 Jan 2001 13:42:17 +1300

On Wed, 17 Jan 2001, Martin v. Loewis wrote:
> > I was following the logic that ext.PrettyPrint can write to a stream
> 
> That assumption is good, it indeed does.
> 
> > and that it is useful to pick up a document that has escaped
> > data(which may be xml itself), add some nodes to it, and save it
> > back to the stream expecting the escaped sections to be still
> > present as escaped sections.
> 
> That logic is flawed (or, there is no logic in it - that's just an
> assertion). Why is that useful? I.e. why would anybody who'll read the
> resulting document need to know where exactly the CDATA sections where
> located in the original document?

umm, I actually don't care where the CDATA sections are in the doucment.  I
thought the most obvious  scenario that I was alluding to is that one reads in
an xml document from a file.  Since one has NO interest in parsing the content,
rendering, or interpreting it, but does have an interest in locating a
particular node and adding a new fragment to it, then saving the modifed
document, via ext.PrettyPrint(which I am using), to file again, then one
obviously does not want CDATA markers to be removed, because, 1) they may have
not written the first document, and 2) they are not trying to interpret it,
this will be done at some later stage, in which case one would use an event
handler xml parser.  Consideriong DOM is useful for document assembly, I don't
see any flaw in this logic.  You missed the point entirely in that I don't care
where they are in the document.

> 
> > So what I understand now is that I should either use a serializer
> > that keeps these, or write a DTD and use that to write my xml back
> > out to file in a more proper way.
> 
> I think your understanding is incorrect. It is not possible to write a
> serializer that produces the original input by just looking at the DOM
> tree, and having a DTD does not help at all, either.

again you are on the wrong track ... I don't care about order .......

> 
> > Which I guess is my next question, what is the cleanest method in
> > PyXML for reading in such a file with CDATA sections, and getting
> > them back out when rewriting?
> 
> The cleanest way is to accept that it is not possible to write the
> document back so that it equals the original document on a
> byte-by-byte basis.

maybe the following will explain why it is useful ..... which is the hack I use
to get CDATA back into the file again.  Presumably you would think that if you
opened an xml file into a DOM tree, then saved it again, then it would still be
the same "kind" of document, i.e. CDATA nodes would STILL be CDATA nodes.

Yes I assume 1) the node name is unique and 2) that it's first child is a
text node ......

def convertTextNodeToCDataNodeByName(doc,name):
    node_list = doc.getElementsByTagNameNS('',name)
    text_node = node_list[0].firstChild
    text_data = retPrettyPrint(text_node)
    new_cdata_node = makeCDataSection(doc,text_data)
    text_node.parentNode.replaceChild(new_cdata_node,text_node)

> 
> It is possible to write the document back so that the content is the
> same as in the original document; the cleanest way for that is to use
> ext.PrettyPrint.
> 
> Regards,
> Martin
> 
> P.S. What you *can* get back is CDATA sections for every text element,
> by properly inheriting from the PrettyPrinter. However, this will give
> you CDATA sections in places where the original document had none.
-- 

regards
Matt