
Hi,
http://www.w3.org/TR/xpath/#section-Location-Steps says there are actually 3 steps of searching: axes, node-tests and predicates
1) as an axis (=outside square brackets and before path), "descendants::p" selects descendants of the currently considered set (initially, this is the root node), then subjects them to the node test ("p"=all <p> tags), so it would select all "p" nodes 2) as a predicate, however, it *selects the descendants of the currently considered node and checks if there's any that satisfies the node test*!
Maybe this is just expressed a bit mistakably, but: IMHO descendant::p does the same wherever applied. It selects the node set consisting of all nodes from the context node's descendant axis that fulfill the node test p.
Node test stage (http://www.w3.org/TR/xpath/#NT-NodeTest) has very limited functionality. In particular, it cannot use patterns except "*" (only single names), so we can't select both tags at the same time with it.
So, the expression you're looking for is either
(descendant::p|descendant::l)//*[@n=1]
(selects tags at axis stage) which would select all "p" tags, then all "l" tags, then iterate over all elements in their subtrees ("//*") and check every one for the condition, or
(//p|//l)//*[@n=1]
(1)
timeit.Timer("t.xpath('(//p|//l)//*[@n=1]')", setup='from __main__ import t').repeat() [42.82267999649048, 43.092921018600464, 43.01901602745056]
which is equivalent for our case: searching the spec for "//" finds http://www.w3.org/TR/xpath/#path-abbrev which says "// is short for /descendant-or-self::node()/", or
//*[(ancestor::p or ancestor::l) and @n=1]
(2)
timeit.Timer("t.xpath('//*[(ancestor::p or ancestor::l) and @n=1]')", setup='from __main__ import t').repeat() [52.73698687553406, 53.14989614486694, 52.96542406082153]
(selects tags at predicate stage) which would iterate over _all_ nodes ("//*" - subtree of root), and for every one check if it has "p" or "l" as an ancestor (which is much more work) etc, or
//*[name()='p' or name()='l']//*[@n=1]
(3)
timeit.Timer("""t.xpath('//*[name()="p" or name()="l"]//*[@n=1]')""", setup='from __main__ import t').repeat() [56.8190279006958, 56.754719972610474, 56.88860607147217]
to find both kinds of tags in one go and only then explore their subtrees - this looks like the most efficient way.
It's not, at least not for lxml/libxml2@2.9.1 according to timeit ;-) This will of course depend on the XPath implementation but my guess is that (3) is actually rather inefficient because the filter predicate [name()='p' or name()='l'] needs to get applied to all elements whereas for (1) no predicate evaluation is necessary for getting at the p and l elements - those are selected by a (union of) node set selection. I haven't looked at libxml2 sources, though.
"|" can be used in the 3rd case as well, but in a predicate, it can behave unpredictably:
In [125]: t.xpath("//*[(ancestor::p|@n=1)]") Out[125]: [<Element a at 0x163a8c8>, <Element b at 0x163a120>, <Element c at 0x163adf0>]
Wow, I didn't even know this is possible, same result with Xalan 2.7.1. So why does this not return /root/p/a[2]? Does this effectively translate to //*[(ancestor::p and @n=1) or @n=1] ?
In [126]: t.xpath("//*[(ancestor::p or @n=1)]") Out[126]: [<Element a at 0x163a8c8>, <Element a at 0x163afd0>, <Element b at 0x163a120>, <Element c at 0x163adf0>]
In [131]: t.xpath("//*[(@n=1|ancestor::p)]") XPathEvalError: Invalid type
Doesn't produce an error in Xalan 2.7.1, returns an empty result set.
..so you should better use "|" in a FilterExpr and "or" in a Predicate.
I somehow agree :-) but then again you could use it for things like //xs:element[@type=(//xs:simpleType/@name|//xs:complexType/@name)] which is analoguous to //xs:element[@type=//xs:simpleType/@name or @type=//xs:complexType/@name] So maybe "|" should be avoided as a "top-level" operator in predicates, for sanity reasons. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart