Re: [lxml-dev] Some XPath questions...
Mike Meyer wrote:
In <4689898E.9080509@colorstudy.com>, Ian Bicking <ianb@colorstudy.com> typed:
Stefan Behnel wrote:
So when I use // it works. Huh. I prefer descendant-or-self, because I find it peculiar to do a search from the root when you've called the method on some particular element (that may not be at the root). There's also ".//*". That seems to be equivalent to //*, i.e., // goes directly to the root regardless of context.
Not quite. '//*' always goes to the root. './/*' starts at the current node and matches from there down. If you always test at the root of the document, they'll look the same.
It seems to be changing the results when I replace 'descendant-or-self::' with './/'. I want to include the current node if it matches; at least to me, that seems most logical. Also necessary when I was doing microformat parsing, as a single element can have multiple roles. It seems like .// excludes the current node, only looking at descendants.
>> div:empty (no children, including text, maybe not including whitespace). > Ouch. let me think about that one. Yeah, I couldn't figure that one out. I thought this might work: >>> xpath('E:empty') e[count(./children::*) = 0 and string(.) = ''] But maybe I don't understand how count() works; this isn't a valid XPath expression. You want "child" not "children". Using normalize-space(.) instead of string(.) will exclude whitespace. This does assume you are ignoring comments and PIs; I believe that's the behavior you want. Cool, that seems to work right. What about "e[not(*) and not(normalize-space())]" ? Yes, that works too.
That's the 'implicit conversion' I was talking about. You're relying on 0 and the empty string being false. It's a standard idiom, and pythonic, but I'm not sure you want to use it in automatically generated code, since it means you can't generalize the code from "has 0 children" to "has n children".
In this case it's a fixed expression used for e:empty, and nothing else, so it seems fine. And possibly makes the resulting expression a bit easier to recognize from its CSS roots. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
Ian Bicking wrote:
> >>> xpath('E:empty') > e[count(./children::*) = 0 and string(.) = ''] > But maybe I don't understand how count() works; this isn't a > valid XPath expression. You want "child" not "children". Using normalize-space(.) instead of string(.) will exclude whitespace. This does assume you are ignoring comments and PIs; I believe that's the behavior you want. Cool, that seems to work right. What about "e[not(*) and not(normalize-space())]" ? Yes, that works too.
That's the 'implicit conversion' I was talking about. You're relying on 0 and the empty string being false. It's a standard idiom, and pythonic, but I'm not sure you want to use it in automatically generated code, since it means you can't generalize the code from "has 0 children" to "has n children".
In this case it's a fixed expression used for e:empty, and nothing else, so it seems fine. And possibly makes the resulting expression a bit easier to recognize from its CSS roots.
It's also likely faster. I don't think libxml2 optimises the comparisons, so looking for "not(*)" can stop false after the first node, while "count(./child::*) = 0" needs to count all children and then sees that, oh, the number is bigger than 0. Stefan
participants (2)
-
Ian Bicking
-
Stefan Behnel