[XML-SIG] Preserving XML and DocType declaration attributes using DOM

19 Mar 2002 21:36:51 +0100

Dinu Gherman <gherman@darwin.in-berlin.de> writes:

> Hi, I have something like the DOM code below, where I don't succedd in
> getting at the doctype attributes. I'm trying to read an XML file with 
> DOM, manipulate it and save it to a new file... The PrettyPrint function
> seems not to preserve the attributes of the XML and doctype declarations. 
> 
> Is there some other canonical way of doing this, maybe? Or is it an
> issue with the Python 2.2 and PyXML 0.7 I'm using?

To my knowledge, the DOM, as specified, does not support this kind of
operation (atleast not in DOM level 2). Neither does any of the Python
DOM implementations provide this as an extension.

If you happen to know what the document type declaration should have
been in the document, you can easily write it back out when printing
the document.

If you need roundtrip support for any kind of document type, you best
select a parser that both
a) passes document type fragments to the application, and
b) can be used to build a DOM tree.

You would then need to hook into the DOM building process, forking the
DTD data into a separate object.

Notice that, in general, this is a tricky problem: DTDs are *very*
expressive, with conditional statements etc, so that an object
representing the full grammatical structure of a DTD would be quite
sophisticated.

If you know that there will never be an internal DTD subset, the
problem is simplified significantly, as you only need to store public
and system identifiers, and root element name.

Regards,
Martin