Mailman 3 [lxml-dev] xinclude bug? - lxml - The Python XML Toolkit

24 Sep 2006

      I'm working on a project that will use lxml's xinclude functionality to
insert the contents of python files into an xml document and have noticed a
possible bug.  When you xinclude with the parse attribute set to "text", the
text frequently (though not always) gets loaded into multiple adjacent text
nodes, so that if you access the text attribute of a containing element, you
only get part of the actual text.  You can verify this by calling

element.xpath('text()')

on the container... you get back a list with multiple elements.  Is this how
things are supposed to work?

Also, escaping seems to occur in strings accessed from the "text" attribute
of xincluded content, but not in strings retrieved via xpath, as described
above.  Is there a reliable way to reverse the escaping process, so that the
original contents of the xincluded file can be retrieved?  I assume that
xml.sax.saxutils.unescape() would work, but don't know for sure.

I've pasted some example code below to demonstrate the seemingly broken
"text" attribute use and the different escaping styles.  Thanks,

Greg

doc.xml
-----------------------------
<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="doc.py" parse="text"/>
</doc>

doc.py
---------------------------
#!/usr/bin/python

s1 = '3 < 4'
s2 = "hello;"

test.py
--------------------------
from lxml import etree
tree = lxml.parse('doc.xml')
tree.xinclude ()
root = tree.getroot()
print repr(root.text)
print '----'
print root.xpath('text()')

[lxml-dev] xinclude bug?

Greg Steffensen

Stefan Behnel

Stefan Behnel

tags

participants (2)