[lxml-dev] Can't load external DTD.
![](https://secure.gravatar.com/avatar/955757be5dc9985cceea1c879e68ace6.jpg?s=120&d=mm&r=g)
Hello everyone, I've written this little program that refuses to work: from lxml import etree if __name__ == "__main__": xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) doc = etree.parse(xml_input, parser) Here's the traceback. Traceback (most recent call last): File "C:\Desarrollo\pythontests\lxml\dtd_loader.py", \ line 27, in <module> doc = etree.parse(xml_input, parser) File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1755, in lxml.etree._parseDocument File "parser.pxi", line 1759, in lxml.etree._parseDocumentFromURL File "parser.pxi", line 1681, in lxml.etree._parseDocFromFile File "parser.pxi", line 826 ,in lxml.etree._BaseParser._parseDocFromFile File "parser.pxi",line 450,in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: failed to load external entity "NULL", line 9, column 83 This is a snippet of foo.xml : <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE rem:requirementsProject SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd"> ... Then, I tried to write a custom resolver. from lxml import etree class DTDResolver(etree.Resolver): def resolve(self, url, id, context): print("Resolving (url, %s)(id, %s)"% (url,id)) self.resolve_filename("C:\Desarrollo\pythontests\lxml\JENSEN.dtd", \ context) if __name__ == "__main__": parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) parser.resolvers.add(DTDResolver()) xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" doc = etree.parse(xml_input, parser) But it still fails (same traceback). What am I doing wrong? BTW, lxml version is 2.0.1. Thanks in advance.
![](https://secure.gravatar.com/avatar/53f39a39001cdc4e7434eb4f1ce947b7.jpg?s=120&d=mm&r=g)
Nef Asus <nefasus@gmail.com> (NA) wrote:
NA> Hello everyone, NA> I've written this little program that refuses to work:
NA> from lxml import etree NA> if __name__ == "__main__": NA> xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml"
Double your backslashes or use r"C:\Desarrollo\pythontests\lxml\foo.xml". -- Piet van Oostrum <piet@cs.uu.nl> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet@vanoostrum.org
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi, Nef Asus wrote:
I've written this little program that refuses to work:
from lxml import etree if __name__ == "__main__": xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) doc = etree.parse(xml_input, parser)
Here's the traceback. Traceback (most recent call last): File "C:\Desarrollo\pythontests\lxml\dtd_loader.py", \ line 27, in <module> doc = etree.parse(xml_input, parser) File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1755, in lxml.etree._parseDocument File "parser.pxi", line 1759, in lxml.etree._parseDocumentFromURL File "parser.pxi", line 1681, in lxml.etree._parseDocFromFile File "parser.pxi", line 826 ,in lxml.etree._BaseParser._parseDocFromFile File "parser.pxi",line 450,in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: failed to load external entity "NULL", line 9, column 83
This is a snippet of foo.xml : <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE rem:requirementsProject SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd"> ...
Could you show me what line 9 in your XML file looks like? Stefan
![](https://secure.gravatar.com/avatar/955757be5dc9985cceea1c879e68ace6.jpg?s=120&d=mm&r=g)
Hello Stefan, this is my xml up to line 9: <?xml version="1.0" encoding="iso-8859-1" ?> <!-- Old reference <!DOCTYPE rem:requirementsProject SYSTEM "C:\tmp\rem\foo.dtd"> --> <!-- <!DOCTYPE rem:requirementsProject SYSTEM "file:C:\\Desarrollo\\pythontes ts\lxml\foo.dtd"> --> <!DOCTYPE rem:requirementsProject SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd"> I omitted the commented lines in my first snippet. PD: I'm sorry I sent the response straight to you and missed the mailing list.
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi, Nef Asus wrote:
lxml.etree.XMLSyntaxError: failed to load external entity "NULL", line 9, column 83
this is my xml up to line 9:
<?xml version="1.0" encoding="iso-8859-1" ?> <!-- Old reference <!DOCTYPE rem:requirementsProject SYSTEM "C:\tmp\rem\foo.dtd"> --> <!-- <!DOCTYPE rem:requirementsProject SYSTEM "file:C:\\Desarrollo\\pythontes ts\lxml\foo.dtd"> --> <!DOCTYPE rem:requirementsProject SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd">
I omitted the commented lines in my first snippet.
Hmm, I think the "NULL" comes from the fact that you only use "SYSTEM", without providing an ID. libxml2 printf()'s the ID here. So, well, it can't find it on your system. Have you tried something like "file://C:/Desarrollo/pythontests/lxml/foo.dtd" ? And: have you made sure the file is actually there? Stefan
![](https://secure.gravatar.com/avatar/955757be5dc9985cceea1c879e68ace6.jpg?s=120&d=mm&r=g)
Hi, Hmm, I think the "NULL" comes from the fact that you only use "SYSTEM",
without providing an ID. libxml2 printf()'s the ID here.
So, well, it can't find it on your system. Have you tried something like
"file://C:/Desarrollo/pythontests/lxml/foo.dtd"
?
And: have you made sure the file is actually there?
All the files are in the right place, so that is not the problem. I tried "file://C:/Desarrollo/pythontests/lxml/foo.dtd", bang, same failure. But "C:/Desarrollo/pythontests/lxml/foo.dtd" does work!. "foo.dtd" works too (foo.xml is in the same location). I'm wondering why, in the unsuccessful cases, the resolver receives both url and id set to None, perhaps the parser doesn't like backward slashes. Any idea?
![](https://secure.gravatar.com/avatar/5d673d8d6d12d5ec45eda8fb071b662f.jpg?s=120&d=mm&r=g)
Nef Asus <nefasus <at> gmail.com> writes:
Hello everyone, I've written this little program that refuses to work:
from lxml import etree if __name__ == "__main__": xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) doc = etree.parse(xml_input, parser)
Here's the traceback. Traceback (most recent call last): File "C:\Desarrollo\pythontests\lxml\dtd_loader.py", \ line 27, in <module> doc = etree.parse(xml_input, parser) File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1755, in lxml.etree._parseDocument File "parser.pxi", line 1759, in lxml.etree._parseDocumentFromURL File "parser.pxi", line 1681, in lxml.etree._parseDocFromFile File "parser.pxi", line 826 ,in lxml.etree._BaseParser._parseDocFromFile File "parser.pxi",line 450,in
lxml.etree._ParserContext._handleParseResultDoc
File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: failed to load external entity "NULL", line 9, column 83
This is a snippet of foo.xml : <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE rem:requirementsProject SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd"> ...
Then, I tried to write a custom resolver.
from lxml import etree class DTDResolver(etree.Resolver): def resolve(self, url, id, context): print("Resolving (url, %s)(id, %s)"% (url,id)) self.resolve_filename("C:\Desarrollo\pythontests\lxml\JENSEN.dtd", \ context)
I had the same exact same problem with lxml 2.03 with a DITA XML file referencing the DITA DTD's (pretty complicated). Switching back to lxml 1.3.6 fixed the problem. Is this problem fixed in any of the 2.x series? Heres's my resolver: class DITA_DTD_Resolver(etree.Resolver): def __init__(self, dtdDir): self.dtdDir = dtdDir def resolve(self, url, id, context): (entityName, ext) = os.path.splitext(url) #dtd = u'<!DOCTYPE %s PUBLIC "%s" "%s">' % (entityName, id, self.dtdDir + '/' + url) return self.resolve_filename(self.dtdDir + '/' + url, context) Jim
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi, Jim Hargrave wrote:
Nef Asus <nefasus <at> gmail.com> writes:
from lxml import etree if __name__ == "__main__": xml_input = "C:\Desarrollo\pythontests\lxml\foo.xml" parser = etree.XMLParser(load_dtd = True, dtd_validation = True, attribute_defaults = True) doc = etree.parse(xml_input, parser)
Here's the traceback. Traceback (most recent call last): File "C:\Desarrollo\pythontests\lxml\dtd_loader.py", \ line 27, in <module> doc = etree.parse(xml_input, parser) File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1755, in lxml.etree._parseDocument File "parser.pxi", line 1759, in lxml.etree._parseDocumentFromURL File "parser.pxi", line 1681, in lxml.etree._parseDocFromFile File "parser.pxi", line 826 ,in lxml.etree._BaseParser._parseDocFromFile File "parser.pxi",line 450,in
lxml.etree._ParserContext._handleParseResultDoc
File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: failed to load external entity "NULL", line 9, column 83
This is a snippet of foo.xml : <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE rem:requirementsProject SYSTEM "C:\Desarrollo\pythontests\lxml\foo.dtd">
I had the same exact same problem with lxml 2.03 with a DITA XML file referencing the DITA DTD's (pretty complicated). Switching back to lxml 1.3.6 fixed the problem.
Is this problem fixed in any of the 2.x series?
Thanks for pointing me at the problem. Here's a patch. Will be fixed in 2.1beta1 and 2.0.5 (when it comes out). Stefan === src/lxml/parser.pxi ================================================================== --- src/lxml/parser.pxi (revision 3984) +++ src/lxml/parser.pxi (local) @@ -333,7 +333,7 @@ c_context, _cstr(data)) elif doc_ref._type == PARSER_DATA_FILENAME: c_input = xmlparser.xmlNewInputFromFile( - c_context, _cstr(doc_ref._data_bytes)) + c_context, _cstr(doc_ref._filename)) elif doc_ref._type == PARSER_DATA_FILE: file_context = _FileReaderContext(doc_ref._file, context, url) c_input = file_context._createParserInput(c_context)
participants (4)
-
Jim Hargrave
-
Nef Asus
-
Piet van Oostrum
-
Stefan Behnel