externalDTDs are overwritten when distinct multiply specified
data:image/s3,"s3://crabby-images/94c87/94c87c9f5ac8ead7b73eaf42652e69ef2acc9589" alt=""
Hello lxml-ers! I have an existing DTD document that includes several external DTDs. Each of these external DTD will resolve( populating 'docinfo.internalDTD' ) if they are specified one at a time i.e. only one external DTD at a time in the DTD. However if two or more external DTDs are specified then only the entities from the first are loaded into the internalDTD. Am I doing it wrong or is this a bug? For an example, see the python program below. Note that I have local copies of xhtml-lat1.ent and xhtml-symbol.ent, which can be downloaded from http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent and http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent respectively # EXAMPLE START from lxml import etree from io import BytesIO parser = etree.XMLParser(load_dtd=True) xml = b'''\ <?xml version='1.1' encoding='utf-8' ?> <!DOCTYPE root [ <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"> %HTMLsymbol; ]> <root></root>''' tree = etree.parse(BytesIO(xml), parser) print(tree.docinfo.internalDTD.entities()) # EXAMPLE END In the output I see all the entities that would be expected from xhtml-lat1.ent but xhtml-symbol.ent resolves to "<lxml.etree._DTDEntityDecl object name='HTMLsymbol' at 0x7fdbfc1c9b90>" and does not to the contain Entities such as "Alpha" which are present in the file. This behaviour is present in my python3 and python2 environments: $ pip freeze | grep lxml lxml==4.1.1 $ python2 --version Python 2.7.15 $ python --version Python 3.6.5 Note: I am able to work around this issue by combining the local DTD files into a single file
participants (1)
-
Ewan Willis