Incorrect XPath translation of some CSS selectors
Hi, Please consider the following session:
import lxml.cssselect lxml.cssselect.css_to_xpath('a >#b') u"descendant-or-self::a/*[@id = 'b']" lxml.cssselect.css_to_xpath('a >* #b') u"descendant-or-self::a/*//*[@id = 'b']" lxml.cssselect.css_to_xpath('a > #b') u"descendant-or-self::a/*//*[@id = 'b']"
The first and second translations are correct, but the third is not. The third CSS selector is equivalent to the first, not the second as the cssselect parser seems to think. The same bug happens with a class instead of an ID, and maybe in other cases. I can adapt the existing test suite so that is fails where it should not, if it helps. PS: should bug discussion go here on the mailing list or on Github’s or Launchpad’s bug-tracker? Regards, -- Simon Sapin
Simon Sapin, 04.11.2011 00:33:
Please consider the following session:
import lxml.cssselect lxml.cssselect.css_to_xpath('a>#b') u"descendant-or-self::a/*[@id = 'b']" lxml.cssselect.css_to_xpath('a>* #b') u"descendant-or-self::a/*//*[@id = 'b']" lxml.cssselect.css_to_xpath('a> #b') u"descendant-or-self::a/*//*[@id = 'b']"
That's using the unpatched 2.3.1, right?
The first and second translations are correct, but the third is not. The third CSS selector is equivalent to the first, not the second as the cssselect parser seems to think.
The same bug happens with a class instead of an ID, and maybe in other cases.
Looks like a bug to me, yes, although I must say that the CSS spec has always been full of surprises to me. I'd rather verify this against the spec first.
I can adapt the existing test suite so that is fails where it should not, if it helps.
A tested fix is better than a test is better than a report. All three put together are perfect. If you want to provide a test patch, it's best to open a launchpad ticket and put it there.
PS: should bug discussion go here on the mailing list or on Github’s or Launchpad’s bug-tracker?
Depends :) If it's a problem, and you don't know if it's a bug or not, ask on this list. If it's a sure bug that you only want to report so that it won't get lost, go to the launchpad bug tracker. If it's a sure bug (or missing feature) and you have a patch to fix it, consider opening a pull request on github directly. Stefan
Le 04/11/2011 00:58, Stefan Behnel a écrit :
That's using the unpatched 2.3.1, right?
Yes. Results are similar (but still incorrect) whatever the translation of the descendant selector. This one is just easier to read. The point is that a > #b is considered the same as a > * #b (with a descendant combinator before #b) rather than a>#b
Looks like a bug to me, yes, although I must say that the CSS spec has always been full of surprises to me.
Indeed! Visual rendering is a lot of "fun" too :]
I'd rather verify this against the spec first.
The child combinator is defined here: http://www.w3.org/TR/selectors/#child-combinators Though in the examples rather than the formal spec, it seems clear to me that optional whitespace around ">" has no meaning.
I can adapt the existing test suite so that is fails where it should not, if it helps.
A tested fix is better than a test is better than a report. All three put together are perfect. If you want to provide a test patch, it's best to open a launchpad ticket and put it there.
PS: should bug discussion go here on the mailing list or on Github’s or Launchpad’s bug-tracker?
Depends :)
If it's a problem, and you don't know if it's a bug or not, ask on this list. If it's a sure bug that you only want to report so that it won't get lost, go to the launchpad bug tracker. If it's a sure bug (or missing feature) and you have a patch to fix it, consider opening a pull request on github directly.
I’ll provide a test case later. I had a quick look at the code for a fix, but it’s not clear to me yet how the tokenizer handles whitespace. Regards, -- Simon
Le 04/11/2011 01:18, Simon Sapin a écrit :
I’ll provide a test case later. I had a quick look at the code for a fix, but it’s not clear to me yet how the tokenizer handles whitespace.
Hi, Here is a pull request with a test case and a fix: https://github.com/lxml/lxml/pull/19 It was not so hard once I played a bit with the tokenizer. Regards, -- Simon Sapin
participants (2)
-
Simon Sapin
-
Stefan Behnel