[lxml-dev] ancestor-or-self
Hi, I'm observing that xpath axis "ancestor-or-self" does not function properly. Below outputs demonstrate the case. In the first one whole tree is printed, and in the second one ancestor-or-self is used, but the result does not differ. I'll open a ticket for this as a bug, unless someone tells me that I'm missing a point. Thanks, Polat Tuzla In [309]: print etree.tostring(root, pretty_print=True) <a> <b> <c/> <x> <z/> </x> </b> </a> In [310]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[0], pretty_print=True) .....: <a> <b> <c/> <x> <z/> </x> </b> </a>
Polat Tuzla, 09.11.2009 14:01:
I'm observing that xpath axis "ancestor-or-self" does not function properly. Below outputs demonstrate the case. In the first one whole tree is printed, and in the second one ancestor-or-self is used, but the result does not differ.
I'll open a ticket for this as a bug, unless someone tells me that I'm missing a point. Thanks,
Polat Tuzla
In [309]: print etree.tostring(root, pretty_print=True) <a> <b> <c/> <x> <z/> </x> </b> </a>
In [310]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[0], pretty_print=True) .....: <a> <b> <c/> <x> <z/> </x> </b> </a>
Note that you only look at the first result using the "[0]" subscript, which in this case is the root node. Stefan
Thank you for your response. I looked at the other results, and they did not seem to obey the xpath axis either. By using ancestor-or-self, I'm expecting an output like this: <a> <b> <c/> </b> </a> But the other results that are returned to me are: In [311]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[1], pretty_print=True) .....: <b> <c/> <x> <z/> </x> </b> In [312]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[2], pretty_print=True) .....: <c/> In [313]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[3], pretty_print=True) .....: IndexError: list index out of range On Mon, Nov 9, 2009 at 3:45 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Polat Tuzla, 09.11.2009 14:01:
I'm observing that xpath axis "ancestor-or-self" does not function properly. Below outputs demonstrate the case. In the first one whole tree is printed, and in the second one ancestor-or-self is used, but the result does not differ.
I'll open a ticket for this as a bug, unless someone tells me that I'm missing a point. Thanks,
Polat Tuzla
In [309]: print etree.tostring(root, pretty_print=True) <a> <b> <c/> <x> <z/> </x> </b> </a>
In [310]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[0], pretty_print=True) .....: <a> <b> <c/> <x> <z/> </x> </b> </a>
Note that you only look at the first result using the "[0]" subscript, which in this case is the root node.
Stefan
Polat Tuzla, 09.11.2009 15:17:
Thank you for your response. I looked at the other results, and they did not seem to obey the xpath axis either. By using ancestor-or-self, I'm expecting an output like this:
<a> <b> <c/> </b> </a>
But the other results that are returned to me are:
In [311]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[1], pretty_print=True) .....: <b> <c/> <x> <z/> </x> </b>
An XPath query will not construct a new tree for you. What you see here is the result of serialising the second node in the result set, including its subtree *as defined in the document*. This has nothing to do with the query you ran *before* the serialisation, and which correctly returned the matching nodes in a list. Stefan
OK. I see.. Thank you all for the quick responses. Regards, Polat On Mon, Nov 9, 2009 at 7:16 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Polat Tuzla, 09.11.2009 15:17:
Thank you for your response. I looked at the other results, and they did not seem to obey the xpath axis either. By using ancestor-or-self, I'm expecting an output like this:
<a> <b> <c/> </b> </a>
But the other results that are returned to me are:
In [311]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[1], pretty_print=True) .....: <b> <c/> <x> <z/> </x> </b>
An XPath query will not construct a new tree for you. What you see here is the result of serialising the second node in the result set, including its subtree *as defined in the document*. This has nothing to do with the query you ran *before* the serialisation, and which correctly returned the matching nodes in a list.
Stefan
Hi,
I'm observing that xpath axis "ancestor-or-self" does not function properly. Below outputs demonstrate the case. In the first one whole tree is printed, and in the second one ancestor-or-self is used, but the result does not differ.
I'll open a ticket for this as a bug, unless someone tells me that I'm missing a point.
You're missing a point :)
In [309]: print etree.tostring(root, pretty_print=True) <a> <b> <c/> <x> <z/> </x> </b> </a>
Ok, so you print out root here.
In [310]: print etree.tostring(root.xpath("/a/b/c/ancestor-or-self::*")[0], pretty_print=True) .....: <a> <b> <c/> <x> <z/> </x> </b> </a>
Note how you print out root again, now:
root.xpath("/a/b/c/ancestor-or-self::*") [<Element a at 266930>, <Element b at 2668a0>, <Element c at 266960>] root.xpath("/a/b/c/ancestor-or-self::*")[0] is root True
xpath() returns a list of elements in this case, of which you select the first item - which is root. ancestor-or-self is a forward axis and the position of nodes in a forward axis is defined in terms of document order: See the Xpath Rec: 2.4 Predicates [...] Thus, the ancestor, ancestor-or-self, preceding, and preceding-sibling axes are reverse axes; all other axes are forward axes. [...] The proximity position of a member of a node-set with respect to an axis is defined to be the position of the node in the node-set ordered in document order if the axis is a forward axis and ordered in reverse document order if the axis is a reverse axis. The first position is 1. Holger -- DSL-Preisknaller: DSL Komplettpakete von GMX schon für 16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02
I might have talked nonsense here:
ancestor-or-self is a forward axis and the position of nodes in a forward axis is defined in terms of document order:
See the Xpath Rec: 2.4 Predicates
[...] Thus, the ancestor, ancestor-or-self, preceding, and preceding-sibling axes are reverse axes; all other axes are forward axes. [...] The proximity position of a member of a node-set with respect to an axis is defined to be the position of the node in the node-set ordered in document order if the axis is a forward axis and ordered in reverse document order if the axis is a reverse axis. The first position is 1.
First of all, ancestor-or-self is a *reverse-axis* so the "proximity position" is in reverse document order. Second, I now think that this "proximity position" is only relevant with regard to positional predicate filtering, not with regard to the order of nodes in the xpath result node set. E.g.
root.xpath("/a/b/c/ancestor-or-self::*[position()=1]") [<Element c at 266870>] root.xpath("/a/b/c/ancestor-or-self::*[position()=2]") [<Element b at 2668d0>] root.xpath("/a/b/c/ancestor-or-self::*[position()=3]") [<Element a at 266930>]
As you can see, position numbering indeed follows reverse document order here. Third, I *think* that, as the result of the xpath expression "/a/b/c/ancestor-or-self::*" is a node-set, the order of nodes in the nodeset is an implementation detail; document order might be a good choice to have consistent results. Clarifications welcome. Holger -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
jholg@gmx.de wrote:
Third, I *think* that, as the result of the xpath expression "/a/b/c/ancestor-or-self::*" is a node-set, the order of nodes in the nodeset is an implementation detail; document order might be a good choice to have consistent results.
Clarifications welcome.
"An axis is either a forward axis or a reverse axis. An axis that only ever contains the context node or nodes that are after the context node in document order is a forward axis. An axis that only ever contains the context node or nodes that are before the context node in document order is a reverse axis." Forward and reverse axes both contain the nodes in document order. "The proximity position of a member of a node-set with respect to an axis is defined to be the position of the node in the node-set ordered in document order if the axis is a forward axis and ordered in reverse document order if the axis is a reverse axis." In a reverse axis the 'proximity position' is defined in reverse order. root.xpath ("/a/b/c/ancestor-or-self::*") returns the nodes in document order and root.xpath ("/a/b/c/ancestor-or-self::*")[0] is the first node in document order. It's a thinko, not a bug. -- Marcello Perathoner webmaster@gutenberg.org
jholg@gmx.de wrote:
Clarifications welcome.
I found a better explanation. XPath 2.0 clarifies all this: "[Definition: An axis step returns a sequence of nodes that are reachable from the context node via a specified axis. Such a step has two parts: an axis, which defines the "direction of movement" for the step, and a node test, which selects nodes based on their kind, name, and/or type annotation.] If the context item is a node, an axis step returns a sequence of zero or more nodes; otherwise, a type error is raised [err:XPTY0020]. The resulting node sequence is returned in document order. An axis step may be either a forward step or a reverse step, followed by zero or more predicates." ---- http://www.w3.org/TR/xpath20/#dt-axis-step "Note: When using predicates with a sequence of nodes selected using a reverse axis, it is important to remember that the the context positions for such a sequence are assigned in reverse document order. For example, preceding::foo[1] returns the first qualifying foo element in reverse document order, because the predicate is part of an axis step using a reverse axis. By contrast, (preceding::foo)[1] returns the first qualifying foo element in document order, because the parentheses cause (preceding::foo) to be parsed as a primary expression in which context positions are assigned in document order. Similarly, ancestor::*[1] returns the nearest ancestor element, because the ancestor axis is a reverse axis, whereas (ancestor::*)[1] returns the root element (first ancestor in document order). The fact that a reverse-axis step assigns context positions in reverse document order for the purpose of evaluating predicates does not alter the fact that the final result of the step is always in document order." ---- http://www.w3.org/TR/xpath20/#id-predicates -- Marcello Perathoner webmaster@gutenberg.org
participants (4)
-
jholg@gmx.de
-
Marcello Perathoner
-
Polat Tuzla
-
Stefan Behnel