
Hi again, Stefan Behnel wrote:
<?xml version="1.0" encoding="UTF-8"?> <doc xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="doc.py " parse="text"/> </doc>
Note the whitespace around the include element. What I get for the XPath call after running the include is:
['\n ', '#!/usr/bin/python\n\ns1 = \'3 < 4\'\ns2 = "hello;"\n', '\n']
So the new text was correctly added between the two existing text nodes. Now, what happens internally is that libxml2 adds special xinclude nodes around the included part as a kind of marker. So, when we collect text nodes for the ".text" property, we stop at the xinclude nodes and only regard the text before them. This results in what you see for the text property:
'\n '
I consider this a bug in lxml. I think we should step over xinclude nodes when collecting text content.
I fixed this. When collecting text nodes, we now step over xinclude nodes and continue. Stefan