Hi lxml lovers,
I've discovered a strange behaviour when processing XIncludes. From my understanding, XInclude processing is an all-or-nothing method, right? If you resolve them with the .xinclude() method, then _all_ are resolved (pretending all are well-formed).
Let's consider three DocBook 5 files: main.xml, first.xml, and second.xml. The main.xml includes first.xml which includes second.xml. The whole source code can be found in my Gist.
Basically, it looks like this:
-- main.xml -- article title xi:include href="first.xml"
-- first.xml -- section title xi:include href="second.xml"
-- second.xml -- section title para The quick brown fox
To resolve any XInclude elements, this is what I do (excerpt from the check.py file):
def parse(xmlfile, xinclude=True): try: xmlparser = etree.XMLParser(collect_ids=False, recover=False) tree = etree.parse(xmlfile, parser=xmlparser) if xinclude: tree.xinclude() return 0 except (etree.XMLSyntaxError, etree.XIncludeError) as err: print("ERROR: %s" % err, file=sys.stderr) print(textwrap.indent(str(err.error_log), prefix=" "), file=sys.stderr) return 10
If all XML files are well-formed, this works fine.
However, consider a user case with a syntax error in one of your files. Let's assume, I removed the </para> end tag in the file "second.xml".
When I process first.xml (which directly includes second.xml), I get the following expected error:
$ python3 check.py first.xml ERROR: Opening and ending tag mismatch: para line 8 and section, line 10, column 11 [...]
However, when I process the main.xml file (where there is a level inbetween), nothing happens:
$ python3 check.py main.xml # -- no output --
Why? This wasn't expected! It seems, only the first level of XIncludes (the first.xml) are resolved.
If you compare the behaviour with the xmllint command, I get this:
$ xmllint --noout --xinclude main.xml second.xml:7: parser error : Opening and ending tag mismatch: para line 8 and section </section> ^ second.xml:9: parser error : Premature end of data in tag section line 4
^ first.xml:5: element include: XInclude error : could not load second.xml, and no fallback was found
Is this a bug? Do I miss something? Any help would be very appreciated!
Thank you! :)
---- Reference  https://gist.github.com/tomschr/8d86797ba19c57f41ea2a47535e8b431