
Hello, I'm using lxml to resolve XIncludes in a document. This works very well, however, I need to know the file each element originates from for error reporting. If I understand that correclty, the location should be stored in the 'base' and 'sourceline' variables for each element. Unfortunately, I can't seem to get the right file name from there. What am I getting is the same base for all the included files files. The line numbers are correct though. Am I doing something wrong or could that be a bug? I use python-lxml-3.2.1 and here is what I'm working with: $ cat parent.xml <parent xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="level1.xml"/> <xi:include href="level2.xml"/> </parent> $ cat level1.xml <level1 xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="level2.xml"/> </level1> $ cat level2.xml <level2> <content/> </level2> -------- The Code -------- $ cat try.py #!/usr/bin/python import sys from lxml import etree from StringIO import StringIO def print_elem(e): print e.tag, e.base, e.sourceline for c in e.getchildren(): print_elem(c) doc = etree.parse(open("parent.xml", "r")) doc.xinclude() print etree.tostring(doc.getroot()) print "" print_elem(doc.getroot()) print "" # example from: # http://lxml.de/api.html#xinclude-and-elementinclude data = StringIO('''\ <doc xmlns:xi="http://www.w3.org/2001/XInclude"> <foo/> <xi:include href="level2.xml"/> </doc>''') tree = etree.parse(data) tree.xinclude() print_elem(tree.getroot()) print "" print("%-20s: %s" % ('Python', sys.version_info)) print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION)) print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION)) print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION)) print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION)) print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION)) ------ Output ------ $ ./try.py <parent xmlns:xi="http://www.w3.org/2001/XInclude"> <level1 xmlns:xi="http://www.w3.org/2001/XInclude"> <level2> <content/> </level2> </level1> <level2> <content/> </level2> </parent> parent /home/rpazdera/work/lxml_bug/parent.xml 1 level1 /home/rpazdera/work/lxml_bug/parent.xml 1 level2 /home/rpazdera/work/lxml_bug/parent.xml 1 content /home/rpazdera/work/lxml_bug/parent.xml 2 level2 /home/rpazdera/work/lxml_bug/parent.xml 1 content /home/rpazdera/work/lxml_bug/parent.xml 2 doc None 1 foo None 2 level2 None 1 content None 2 Python : sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) lxml.etree : (3, 2, 1, 0) libxml used : (2, 9, 1) libxml compiled : (2, 9, 1) libxslt used : (1, 1, 28) libxslt compiled : (1, 1, 28) Thanks, Radek Pazdera

Radek Pazdera, 22.08.2013 11:14:
The problem here is that the original document is lost after the XInclude run (from the POV of the merged tree), and it's the document that knows the original source base of the element. I.e., there is only one base URL for the whole document. The fact that it's the Element object that provides the "base" property can be considered a design quirk in that regard. In any case, I wouldn't know any way to recover the information where a specific included element originally came from after the XInclude run. Stefan

Radek Pazdera, 22.08.2013 11:14:
The problem here is that the original document is lost after the XInclude run (from the POV of the merged tree), and it's the document that knows the original source base of the element. I.e., there is only one base URL for the whole document. The fact that it's the Element object that provides the "base" property can be considered a design quirk in that regard. In any case, I wouldn't know any way to recover the information where a specific included element originally came from after the XInclude run. Stefan
participants (2)
-
Radek Pazdera
-
Stefan Behnel