data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, Jamie Norrish wrote:
On Thu, 2009-04-30 at 09:42 +0200, Stefan Behnel wrote:
It would be rarely used, I'd say. What sort of interesting XPath queries could you possibly do on a node that doesn't have any children, nor attributes, nor a tag name or namespace.
Besides selecting other nodes and values relative to the text? Yes, it is possible to use text_result.getparent() and proceed from there - but this has the downside of requiring, for some XPath expressions, the code to modify the expression based on whether text_result was the text or tail of its parent, which is annoying.
Ok, I do see your use case, although I still don't know what your selections look like in practice. If you want a more predictable XPath result, maybe it would make sense to select the surrounding element instead of the plain text content. As I said, lxml.etree does not have a representation for text nodes. So by adding an xpath() method to text results, you'd end up with a rather fragile setup that might crash when you replace the text of a node, just because an XPath text result is still holding a reference to a now-dead text node, for example. So it's not just adding a method, it's more like rethinking concepts inside lxml.etree. I'm pretty sure this use case is not worth going there - especially since it's nothing that can't be done today, but rather an inconvenience.
Also, XPath queries can return Elements and (special) strings, but also plain numbers and boolean values. So you'd still not have a common interface for all possible result types.
Well, I'm not really asking for a common interface - only that XPath be enabled for the results of an XPath expression for text(). This would bring it into line with XSLT behaviour, for one.
Well, XSLT is a different language with a different tree model.
About using iterwalk: this wouldn't seem (on a quick perusal of the documentation) to easily allow for me to get the preceding context of the text result, unless I picked some arbitrary earlier element as the starting point. What am I missing?
I guess I misjudged your use case when you first described it. iterwalk() will not allow you to access the text context preceding an element, only the text content of the element itself. I still do not have a clear idea of what you consider "text context" actually. Does that take the tree structure into account (e.g. only within a certain parent element), or is it just any text content that precedes the XPath result in reverse document order, wherever it occurs in the tree? What about just stepping up parent by parent until the contained text content is long enough? Or, if it's too long, split it by the substring that XPath found, and strip the left and right part... Stefan