Problems with XML and DTD usage

Chris Prinos cprinos at foliage.com
Wed May 29 21:16:53 EDT 2002


Using Python 2.1, and pyXML 0.7.1, I'm having some difficulty with xml
documents that use DTD. What I want to do is parse an xml document that has
a doctype declaration specifying the DTD and validate it. I then need to
manipulate the document a bit (keeping it valid) and spit it back out to a
file. If I parse using validation, the validation takes place, but the
resulting document contains an empty root node. I get the whole document if
I parse without validation, but then the doctype declaration doesn't contain
a systemId when streamed out after manipulation. Shouldn't parsing with and
without validation return the same document object (assuming it's valid to
begin with)? And shouldn't the non-validating parser maintain the doctype
declaration in the resulting document instance (even if it's not used by the
parser to validate the xml)?

Chris


-- t1.xml ---
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE a SYSTEM "t1.dtd">
<a>
 <b>simple test</b>
</a>

-- t1.dtd --
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT a (b)>
<!ELEMENT b (#PCDATA)>

-- test code --
>>> import xml.dom.ext.reader.Sax2 as Sax2
>>> ValReader = Sax2.Reader(validate=1)
>>> NonValReader = Sax2.Reader(validate=0)
>>> vd = ValReader.fromStream(open('t1.xml'))
>>> nvd = NonValReader.fromStream(open('t1.xml'))
>>> from xml.dom.ext import PrettyPrint as PPrint
>>>
>>> PPrint(vd)    # this shows vd to have an empty root
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE a SYSTEM "t1.dtd">
<a/>
>>>
>>> PPrint(nvd)  # this shows nvd to have a non-valid doctype declaration
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE a>
<a>
  <b>simple test</b>
</a>






More information about the Python-list mailing list