[XML-SIG] pyexpat: Comments before DOCTYPE
Ingo van Lil
inguin at gmx.de
Mon Feb 13 15:30:42 CET 2006
Hello there,
I ran into a minor problem using the xml.dom.minidom XML parser: An XML
document having a comment before a DOCTYPE node seems to leave the DOM
data structures in an inconsistent state.
Let's say I have a little test.xml file:
<?xml version="1.0"?>
<!-- comment -->
<!DOCTYPE test SYSTEM "test.dtd">
<test> <tag2> Hello world </tag2> </test>
and a little Python program to parse it:
from xml.dom.minidom import parse
dom = parse("test.xml")
print "document node:", dom
print len(dom.childNodes), "children"
print "first child:", dom.firstChild
print "next sibling:", dom.firstChild.nextSibling
The output of that program is:
document node: <xml.dom.minidom.Document instance at 0xb7b82b6c>
3 children
first child: <DOM Comment node " comment ">
next sibling: None
I.e. the document node does have three children (a comment node, a
DocumentType instance and an element), but the first child's nextSibling
pointer isn't set correctly. This breaks my algorithm, which is supposed
to recursively walk the entire DOM tree, but stops after the first node
instead.
I'm not entirely sure whether this really is a bug in pyexpat or an
error in my XML file. I haven't found any hints whether an XML document
is allowed to have comment before the DOCTYPE declaration. xmllint
doesn't seem to complain about it, though.
Cheers,
Ingo
More information about the XML-SIG
mailing list