
Dec. 27, 2017
12:35 p.m.
I’m using following code to extract DOCTYPE with python 2.7 and lxml 4.1.1: from lxml import tree from StringIO import StringIO if __name__ == '__main__': doc = etree.parse(StringIO('''<?xml version="1.0"?> <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd"> <log4j:configuration xmlns:log4j = "http://jakarta.apache.org/log4j/" debug="false"> <a>tasty</a> </log4j:configuration>''')) print "Type: {}\n".format(doc.docinfo.doctype) But it returns: Type: <!DOCTYPE configuration SYSTEM "log4j.dtd”> And not, as I expected: Type: <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd"> Is it a bug in lxml? Is there a workaround for getting what I’m expecting? =:-) Kim Grønborg Nielsen M kgn+lxml@network-it.dk