[lxml-dev] element of an xpath evaluation

Hi, very recently I needed a program that evaluates a xpath and displays the "result" graphically in a tree of the xml file that was used. I found XPath Explorer a java application, however it does quite some more stuff than I need and there certain things that just don't work as I want. I figured I could very easily provide something similar using lxml and PyQt4. However if I want to highlight the tree node that the xpath matches I have a "problem" when the xpath matches attributes or text nodes. So the question is: Is there a way using lxml to find out to which element a certain non-element result of an xpath evaluation belongs? Andreas -- You now have Asian Flu.

Hi Andreas, Andreas Pakulat wrote:
That sounds pretty interesting. Please post a link when you have something usable.
Not straight away. Both are returned as strings, so you loose the information where it came from. You can try to run a second XPath expression to find the result text or attribute value in the tree, but that's bound to fail if text data is not unique (which is pretty likely for attributes). A 'stupid idea' would be to fiddle with the XPath expression and add a function call after each traversed node that stores the element in a list. You could then trace the evaluation path. Something like "a/b//c[true()]/d/text()" -> "a[store(.)]/b[store(.)]//c[(true()) and store(.)]/d[store(.)]/text()" But that would require you to 'parse' the expression, I don't know if you can get that done with regexps... Stefan

On 30.05.06 22:10:06, Stefan Behnel wrote:
I will.
Yeah, tell me ;-)
uh oh. No. I guess I'll go with PyXML and it's dom-Model then. That'll give me a proper AttrNode for the attributes. Thanks for the clarification. Andreas -- You possess a mind not merely twisted, but actually sprained.

On 30.05.06 22:10:06, Stefan Behnel wrote:
I tried a few things and to me it seems running a second XPath-Expression using the extra step /parent::node() gives me the element node. Now the question is: Can I assume that the last step either contains text() or attribute::<attribute name> or @attrname? The only problem I see is that I need to traverse the text-childs of the elements returned when the XPath selects text nodes to know which strings belong to which elements. Are there other ways to get at text() or attribute nodes? Andreas -- Are you ever going to do the dishes? Or will you change your major to biology?

Hi Andreas, Andreas Pakulat wrote:
Sure, good idea.
Now the question is: Can I assume that the last step either contains text() or attribute::<attribute name> or @attrname?
You mean as the result of an XPath expression? Well, you may get back bool values or generated strings (can you?), in which case you can't expect to find out what node (or nodes) they came from. Also, AFAIR, you can merge multiple XPath expressions into one and that case may be hard to detect. The last part of an XPath expression is not always what returned the result...
Note that you can get back a wild combination of strings, nodes and numbers, so there is a bit of work to do anyway.
Are there other ways to get at text() or attribute nodes?
What do you mean? Stefan

On 03.06.06 12:59:10, Stefan Behnel wrote:
I mean if the result of an XPath is a list of strings, can I assume that this was created by either a ::text() or an attribute:: expression.
Well, you may get back bool values or generated strings (can you?), in which case you can't expect to find out what node (or nodes) they came from.
Well, for bool values you cannot get at the "matched" nodes anyway and it doesn't make sense for that, I think. The same is true AFAIK for generated strings. The evaluator can only highlight tree nodes that are part of the result of an XPath expression, it will probably show the result in cases of generated strings or bools but it cannot display any tree node for there anyway.
Hmm, could you give me an example for something like that? I'm not that familiar with XPath...
Ah, I didn't see "|" until now. Well, that makes the whole thing a bit "harder", because I can't tell wether a given string is created from a text node or is the value of an attribute. I'm thinking, maybe I should just highlight the element for any text returned (regardless wether it is from an attribute or a text node) and not the try to find the proper attribute or text... After all you can easily see the attributes and text of the element.
Are there other ways to get at text() or attribute nodes?
What do you mean?
I mean, is there another way to have the text node of an element as result of the xpath expression, other than having the text() function somewhere (probably at the end of the path)? The same for attributes, is there another way to get the values of attributes in the result, other than using attribuge::(*|<name>) or using @<name>? Andreas -- Don't hate yourself in the morning -- sleep till noon.

Hi Andreas, Andreas Pakulat wrote:
What about "string(a)" ?
There's all sorts of weird expressions you could come up with...
It already makes it harder to find its parent element. You may still end up having to parse the expression to find partial '|' expressions etc.
Functions are a good way. Stefan

On 03.06.06 14:54:58, Stefan Behnel wrote:
There's all sorts of weird expressions you could come up with...
Thanks for that input. I guess for now I just skip attributes and text nodes and only highlight element nodes. Maybe I'll look into this at a later time again.
I might end up writing my own XPath Parser, which is clearly out of my reach atm. Again thanks for your help on this. After all, I myself only need XPath's that return element nodesets for my purposes... Andreas -- You never hesitate to tackle the most difficult problems.

Hi Andreas, Andreas Pakulat wrote:
That sounds pretty interesting. Please post a link when you have something usable.
Not straight away. Both are returned as strings, so you loose the information where it came from. You can try to run a second XPath expression to find the result text or attribute value in the tree, but that's bound to fail if text data is not unique (which is pretty likely for attributes). A 'stupid idea' would be to fiddle with the XPath expression and add a function call after each traversed node that stores the element in a list. You could then trace the evaluation path. Something like "a/b//c[true()]/d/text()" -> "a[store(.)]/b[store(.)]//c[(true()) and store(.)]/d[store(.)]/text()" But that would require you to 'parse' the expression, I don't know if you can get that done with regexps... Stefan

On 30.05.06 22:10:06, Stefan Behnel wrote:
I will.
Yeah, tell me ;-)
uh oh. No. I guess I'll go with PyXML and it's dom-Model then. That'll give me a proper AttrNode for the attributes. Thanks for the clarification. Andreas -- You possess a mind not merely twisted, but actually sprained.

On 30.05.06 22:10:06, Stefan Behnel wrote:
I tried a few things and to me it seems running a second XPath-Expression using the extra step /parent::node() gives me the element node. Now the question is: Can I assume that the last step either contains text() or attribute::<attribute name> or @attrname? The only problem I see is that I need to traverse the text-childs of the elements returned when the XPath selects text nodes to know which strings belong to which elements. Are there other ways to get at text() or attribute nodes? Andreas -- Are you ever going to do the dishes? Or will you change your major to biology?

Hi Andreas, Andreas Pakulat wrote:
Sure, good idea.
Now the question is: Can I assume that the last step either contains text() or attribute::<attribute name> or @attrname?
You mean as the result of an XPath expression? Well, you may get back bool values or generated strings (can you?), in which case you can't expect to find out what node (or nodes) they came from. Also, AFAIR, you can merge multiple XPath expressions into one and that case may be hard to detect. The last part of an XPath expression is not always what returned the result...
Note that you can get back a wild combination of strings, nodes and numbers, so there is a bit of work to do anyway.
Are there other ways to get at text() or attribute nodes?
What do you mean? Stefan

On 03.06.06 12:59:10, Stefan Behnel wrote:
I mean if the result of an XPath is a list of strings, can I assume that this was created by either a ::text() or an attribute:: expression.
Well, you may get back bool values or generated strings (can you?), in which case you can't expect to find out what node (or nodes) they came from.
Well, for bool values you cannot get at the "matched" nodes anyway and it doesn't make sense for that, I think. The same is true AFAIK for generated strings. The evaluator can only highlight tree nodes that are part of the result of an XPath expression, it will probably show the result in cases of generated strings or bools but it cannot display any tree node for there anyway.
Hmm, could you give me an example for something like that? I'm not that familiar with XPath...
Ah, I didn't see "|" until now. Well, that makes the whole thing a bit "harder", because I can't tell wether a given string is created from a text node or is the value of an attribute. I'm thinking, maybe I should just highlight the element for any text returned (regardless wether it is from an attribute or a text node) and not the try to find the proper attribute or text... After all you can easily see the attributes and text of the element.
Are there other ways to get at text() or attribute nodes?
What do you mean?
I mean, is there another way to have the text node of an element as result of the xpath expression, other than having the text() function somewhere (probably at the end of the path)? The same for attributes, is there another way to get the values of attributes in the result, other than using attribuge::(*|<name>) or using @<name>? Andreas -- Don't hate yourself in the morning -- sleep till noon.

Hi Andreas, Andreas Pakulat wrote:
What about "string(a)" ?
There's all sorts of weird expressions you could come up with...
It already makes it harder to find its parent element. You may still end up having to parse the expression to find partial '|' expressions etc.
Functions are a good way. Stefan

On 03.06.06 14:54:58, Stefan Behnel wrote:
There's all sorts of weird expressions you could come up with...
Thanks for that input. I guess for now I just skip attributes and text nodes and only highlight element nodes. Maybe I'll look into this at a later time again.
I might end up writing my own XPath Parser, which is clearly out of my reach atm. Again thanks for your help on this. After all, I myself only need XPath's that return element nodesets for my purposes... Andreas -- You never hesitate to tackle the most difficult problems.
participants (2)
-
Andreas Pakulat
-
Stefan Behnel