Thanks, very helpful. I'm guessing it was an oversight that you didn't
copy the list...
Mike Meyer wrote:
In <468538ED.9060004@colorstudy.com>, Ian Bicking <ianb@colorstudy.com> typed:
I'm trying to implement CSS selectors, by translating them into XPath.
There's some CSS expressions that I'm having a hard time with, so maybe
someone can tell me how they might work.
Expression:
div:first-child -- means a div element when it is the first child of its
parent. I.e.:
<li>
<div id="a">...</div>
<div id="b">...</div>
</li>
It makes the first div and not the second.
I thought this could be:
descendant-or-self::*/div[0]
or... descendant-or-self::*/div[position() = 0]
Those two should be equivalent; the second is a bit easier to handle
programmatically. But it doesn't work (doesn't match anything).
XPath arrays are 1-indexed, not 0-indexed, so position() will never be
0. I understand some version of IE get this wrong as well.
To pick out all the div elements that are the first child of their
parent, use:
//*[position() = 1 and name() = 'div']
or equivalently:
descendant-or-self::*[position() = 1 and name() = 'div']
I don't know how I missed the fact they are 1-indexed... I guess it's
become such an unusual choice these days. But handy anyway, since CSS
is also 1-indexed.
Another expreesion:
div.foo + div -- means a div element that is the immediately next
sibling of a div element with the class .foo. I would translate this to:
descendant-or-self::div[@class='foo']/following-sibling::div[0]
(The class matching is actually a bit more complex, but it doesn't
actually matter to this.) I'm (a) not sure if this is right, because
maybe it means the next div after the matching div, even if there's
another element in-between, and (b) it doesn't return any results
regardless.
Your maybe is right - it means the first div after, whether or not
there are following siblings. You then select the first element from
that list (or would, if you used a 1 instead of a 0). Same solution:
the last bit is following-sibling::*[position() = 1 and name() = 'div']
Another expression:
div:contains('celia') -- means a div where the textual content has the
word 'celia' in it, case insensitive. At least, I think it's case
insensitive -- the CSS spec is annoyingly vague, but implementations
seem to work like this. I translate this to:
descendant-or-self::div[contains(css:lower-case(string(.)), 'celia']
I added the lower-case function like:
def _make_lower_case(context, s):
return s.lower()
etree.FunctionNamespace("css")['lower-case'] = _make_lower_case
But XPath gives so few errors that it's hard to tell if it's really
working. The XPath expression returns some elements, but not the
correct number from what I can tell. Especially since when I had a bug
and wasn't lowercasing the second argument (using 'CELIA') it still
returned elements.
I think you've got the parens in the wrong place - the last close goes
after 'celia', not the comma.
That was just a typo in the email; copying and pasting directly:
>>> xpath('E:contains("foo")')
e[contains(css:lower-case(string(.)), 'foo')]
However, now that I'm writing my own tests it seems fine (I was using
someone else's tests, and I think they were wrong; though I'm not sure
-- you'll always get all the parents of an element if you use that,
since if a child contains text then necessarily all their parents
contain the same text).
There's some other tricky ones I'm not sure about either, though they
seem to be kind of working. Things like div:only-child (when it's a div
//*[name() = 'div' and last() = 1]
This doesn't seem to be working for me:
>>> xpath('span:only-child')
*[name() = 'span' and (last() = 1)]
But testing with <div><span></span></div> in the document, I don't get
anything returned.
These all work now...
with no siblings),
div:last-child (no next sibling)
//*[name() = 'div' and position() = last()]
div:first-child (no previous sibling)
Didn't we just coer that one?
div:first-of-type (no preceding siblings that are divs
//div[position() = 1]
div:last-of-type (no following siblings that are divs),
//div[position() = last()]
div:only-of-type (you are probably getting the pattern)
//div[last() = 1]
div:empty (no children, including text, maybe not including whitespace).
Ouch. let me think about that one.
Yeah, I couldn't figure that one out. I thought this might work:
>>> xpath('E:empty')
e[count(./children::*) = 0 and string(.) = '']
But maybe I don't understand how count() works; this isn't a valid XPath
expression.
There's also
div:nth-child(matcher) and div:nth-of-type(matcher), which selects among
Those should be easy with the above examples.
siblings with patterns like "2" (second sibling), "3n" (every third
element), "odd" (odd elements) and some other selections. I kind of see
how to deal with this using position(), but I'm not sure how to do
either nth-of-type or nth-child (and the ones I do understand I am also
vague about).
--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
| Write code, do good | http://topp.openplans.org/careers