New subject: [lxml-dev] Some XPath questions...

2 Jul 2007


      Mike Meyer wrote:
...
In <4689898E.9080509@colorstudy.com>, Ian Bicking <ianb@colorstudy.com> typed:
...
Stefan Behnel wrote:
...
...
So when I use // it works.  Huh.  I prefer descendant-or-self, because I 
find it peculiar to do a search from the root when you've called the 
method on some particular element (that may not be at the root).
There's also ".//*".
That seems to be equivalent to //*, i.e., // goes directly to the root 
regardless of context.
Not quite. '//*' always goes to the root. './/*' starts at the current
node and matches from there down. If you always test at the root of
the document, they'll look the same.
It seems to be changing the results when I replace 
'descendant-or-self::' with './/'.  I want to include the current node 
if it matches; at least to me, that seems most logical.  Also necessary 
when I was doing microformat parsing, as a single element can have 
multiple roles.  It seems like .// excludes the current node, only 
looking at descendants.
...
...
...
...
...
...
>> div:empty (no children, including text, maybe not including whitespace).
> Ouch. let me think about that one.
Yeah, I couldn't figure that one out.  I thought this might work:
     >>> xpath('E:empty')
     e[count(./children::*) = 0 and string(.) = '']
But maybe I don't understand how count() works; this isn't a valid XPath 
expression.
You want "child" not "children". Using normalize-space(.) instead of
string(.) will exclude whitespace. This does assume you are ignoring
comments and PIs; I believe that's the behavior you want.
Cool, that seems to work right.
What about "e[not(*) and not(normalize-space())]" ?
Yes, that works too.
That's the 'implicit conversion' I was talking about. You're relying
on 0 and the empty string being false. It's a standard idiom, and
pythonic, but I'm not sure you want to use it in automatically
generated code, since it means you can't generalize the code from "has
0 children" to "has n children".
In this case it's a fixed expression used for e:empty, and nothing else, 
so it seems fine.  And possibly makes the resulting expression a bit 
easier to recognize from its CSS roots.


-- 
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
             | Write code, do good | http://topp.openplans.org/careers

Re: [lxml-dev] Some XPath questions...

Ian Bicking

Stefan Behnel

tags

participants (2)