Mailman 3 Re: [lxml-dev] Some XPath questions... - lxml - The Python XML Toolkit

June 29, 2007

      Thanks, very helpful.  I'm guessing it was an oversight that you didn't 
copy the list...

Mike Meyer wrote:
...
In <468538ED.9060004@colorstudy.com>, Ian Bicking <ianb@colorstudy.com> typed:
...
I'm trying to implement CSS selectors, by translating them into XPath. 
There's some CSS expressions that I'm having a hard time with, so maybe 
someone can tell me how they might work.
Expression:
div:first-child -- means a div element when it is the first child of its 
parent.  I.e.:
<li>
     <div id="a">...</div>
     <div id="b">...</div>
   </li>
It makes the first div and not the second.
I thought this could be:
descendant-or-self::*/div[0]
   or... descendant-or-self::*/div[position() = 0]
...
Those two should be equivalent; the second is a bit easier to handle 
programmatically.  But it doesn't work (doesn't match anything).
XPath arrays are 1-indexed, not 0-indexed, so position() will never be
0. I understand some version of IE get this wrong as well.
To pick out all the div elements that are the first child of their
parent, use:
//*[position() = 1 and name() = 'div']
or equivalently:
descendant-or-self::*[position() = 1 and name() = 'div']
I don't know how I missed the fact they are 1-indexed... I guess it's 
become such an unusual choice these days.  But handy anyway, since CSS 
is also 1-indexed.
...
...
Another expreesion:
div.foo + div -- means a div element that is the immediately next 
sibling of a div element with the class .foo.  I would translate this to:
descendant-or-self::div[@class='foo']/following-sibling::div[0]
(The class matching is actually a bit more complex, but it doesn't 
actually matter to this.)  I'm (a) not sure if this is right, because 
maybe it means the next div after the matching div, even if there's 
another element in-between, and (b) it doesn't return any results 
regardless.
Your maybe is right - it means the first div after, whether or not
there are following siblings. You then select the first element from
that list (or would, if you used a 1 instead of a 0). Same solution:
the last bit is following-sibling::*[position() = 1 and name() = 'div']
...
Another expression:
div:contains('celia') -- means a div where the textual content has the 
word 'celia' in it, case insensitive.  At least, I think it's case 
insensitive -- the CSS spec is annoyingly vague, but implementations 
seem to work like this.  I translate this to:
descendant-or-self::div[contains(css:lower-case(string(.)), 'celia']
I added the lower-case function like:
def _make_lower_case(context, s):
       return s.lower()
   etree.FunctionNamespace("css")['lower-case'] = _make_lower_case
But XPath gives so few errors that it's hard to tell if it's really 
working.  The XPath expression returns some elements, but not the 
correct number from what I can tell.  Especially since when I had a bug 
and wasn't lowercasing the second argument (using 'CELIA') it still 
returned elements.
I think you've got the parens in the wrong place - the last close goes
after 'celia', not the comma.
That was just a typo in the email; copying and pasting directly:

     >>> xpath('E:contains("foo")')
     e[contains(css:lower-case(string(.)), 'foo')]

However, now that I'm writing my own tests it seems fine (I was using 
someone else's tests, and I think they were wrong; though I'm not sure 
-- you'll always get all the parents of an element if you use that, 
since if a child contains text then necessarily all their parents 
contain the same text).
...
...
There's some other tricky ones I'm not sure about either, though they 
seem to be kind of working.  Things like div:only-child (when it's a div
//*[name() = 'div' and last() = 1]
This doesn't seem to be working for me:

     >>> xpath('span:only-child')
     *[name() = 'span' and (last() = 1)]

But testing with <div><span></span></div> in the document, I don't get 
anything returned.

These all work now...
...
...
with no siblings),
...
div:last-child (no next sibling)
//*[name() = 'div' and position() = last()]
...
div:first-child (no previous sibling)
Didn't we just coer that one?
...
div:first-of-type (no preceding siblings that are divs
//div[position() = 1]
...
div:last-of-type (no following siblings that are divs),
//div[position() = last()]
...
div:only-of-type (you are probably getting the pattern)
//div[last() = 1]
...
div:empty (no children, including text, maybe not including whitespace).
Ouch. let me think about that one.
Yeah, I couldn't figure that one out.  I thought this might work:

     >>> xpath('E:empty')
     e[count(./children::*) = 0 and string(.) = '']

But maybe I don't understand how count() works; this isn't a valid XPath 
expression.
...
...
There's also 
div:nth-child(matcher) and div:nth-of-type(matcher), which selects among
Those should be easy with the above examples.
...
siblings with patterns like "2" (second sibling), "3n" (every third 
element), "odd" (odd elements) and some other selections.  I kind of see 
how to deal with this using position(), but I'm not sure how to do 
either nth-of-type or nth-child (and the ones I do understand I am also 
vague about).
-- 
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
             | Write code, do good | http://topp.openplans.org/careers

Re: [lxml-dev] Some XPath questions...

Ian Bicking

tags

participants (1)