data:image/s3,"s3://crabby-images/9b726/9b72613785319981a8800f418b99740492b56b75" alt=""
Using lxml trunk: doc.xpath('descendant-or-self::*[starts-with(lower-case(@href), "javascript:")]') works, but: doc.xpath('descendant-or-self::*[matches(@href, "^javascript:", "i")]') Returns ["i"]. This does not seem right...? -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
data:image/s3,"s3://crabby-images/9b726/9b72613785319981a8800f418b99740492b56b75" alt=""
Ian Bicking wrote:
Well, maybe this one doesn't work either (returns 1/0). Now I'm just confused.
-- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
data:image/s3,"s3://crabby-images/9b726/9b72613785319981a8800f418b99740492b56b75" alt=""
Ian Bicking wrote:
Adding to this, I'm trying to do the rel matching with: etree.XPath("descendant-or-self::a[fn:lower-case(@rel)=$rel]") I *have* to use fn:lower-case, not just lower-case, otherwise I get XPathEvalError: Unregistered function. And it doesn't matter if I use it or not, it doesn't effect the outcome at all. Similarly upper-case doesn't change anything. I also tried using XPath(r'...[translate(@class, "\n\t\r", " ")]) and that didn't work. The \n etc doesn't seem to be interpreted; only if I include the actual characters does it work. (I then noticed normalize-whitespace, which is better, but it still seems odd.) How literals are supposed to work in XPath is rather unclear to me, I guess \ isn't an escape character? The spec says use 'something'' if you want to include a literal ' in a string. Which I assume in an XML attribute you'd have to do as 'something'', since it probably gets double-unescaped? Bah. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi Ian, Ian Bicking wrote:
IIRC, "lower-case()" is XPath 2.0. libxml2 supports XPath 1.0 only, so there just is no such function. It's easy to implement that in Python, though: def make_lower_case(ctxt, s): return s.lower() etree.FunctionNamespace("myNs")["lower-case"] = make_lower_case find = etree.XPath( "descendant-or-self::a[fn:lower-case(string(@rel))=$rel]", {'fn':'myNs'}) (Note the call to "string(...)" to make sure we get a string value here, not a node set.) BTW, I get a reproduceable crash with the above under libxml2 2.6.27, but it works with 2.6.28. Sigh...
Hmm, you didn't try without the r'', did you? XPath('...[translate(@class, "\n\t\r", " ")]) That should work as it leaves it to Python to handle the char escapes. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, Ian Bicking wrote:
doc.xpath('descendant-or-self::*[matches(@href, "^javascript:", "i")]')
Returns ["i"]. This does not seem right...?
You're not calling the right function. The exslt functions are in the EXSLT namespaces, so you have to do something like xpath('regexp:matches(., "^huhu", "i")', {'regexp':'http://exslt.org/regular-expressions}) Stefan
data:image/s3,"s3://crabby-images/9b726/9b72613785319981a8800f418b99740492b56b75" alt=""
Ian Bicking wrote:
Well, maybe this one doesn't work either (returns 1/0). Now I'm just confused.
-- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
data:image/s3,"s3://crabby-images/9b726/9b72613785319981a8800f418b99740492b56b75" alt=""
Ian Bicking wrote:
Adding to this, I'm trying to do the rel matching with: etree.XPath("descendant-or-self::a[fn:lower-case(@rel)=$rel]") I *have* to use fn:lower-case, not just lower-case, otherwise I get XPathEvalError: Unregistered function. And it doesn't matter if I use it or not, it doesn't effect the outcome at all. Similarly upper-case doesn't change anything. I also tried using XPath(r'...[translate(@class, "\n\t\r", " ")]) and that didn't work. The \n etc doesn't seem to be interpreted; only if I include the actual characters does it work. (I then noticed normalize-whitespace, which is better, but it still seems odd.) How literals are supposed to work in XPath is rather unclear to me, I guess \ isn't an escape character? The spec says use 'something'' if you want to include a literal ' in a string. Which I assume in an XML attribute you'd have to do as 'something'', since it probably gets double-unescaped? Bah. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi Ian, Ian Bicking wrote:
IIRC, "lower-case()" is XPath 2.0. libxml2 supports XPath 1.0 only, so there just is no such function. It's easy to implement that in Python, though: def make_lower_case(ctxt, s): return s.lower() etree.FunctionNamespace("myNs")["lower-case"] = make_lower_case find = etree.XPath( "descendant-or-self::a[fn:lower-case(string(@rel))=$rel]", {'fn':'myNs'}) (Note the call to "string(...)" to make sure we get a string value here, not a node set.) BTW, I get a reproduceable crash with the above under libxml2 2.6.27, but it works with 2.6.28. Sigh...
Hmm, you didn't try without the r'', did you? XPath('...[translate(@class, "\n\t\r", " ")]) That should work as it leaves it to Python to handle the char escapes. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, Ian Bicking wrote:
doc.xpath('descendant-or-self::*[matches(@href, "^javascript:", "i")]')
Returns ["i"]. This does not seem right...?
You're not calling the right function. The exslt functions are in the EXSLT namespaces, so you have to do something like xpath('regexp:matches(., "^huhu", "i")', {'regexp':'http://exslt.org/regular-expressions}) Stefan
participants (2)
-
Ian Bicking
-
Stefan Behnel