
Hi, everyone. I've been searching around for a simple XPath expression validator, just to check that the XPaths we hand-write are really valid. lxml looks like it might do well at this. However, I did find a case where an invalid expression doesn't throw: [ajvincent@localhost ~]$ python Python 2.7.8 (default, Nov 10 2014, 08:19:18) [GCC 4.9.2 20141101 (Red Hat 4.9.2-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree etree.XPath("//b[contains(.)]") //b[contains(.)]
The contains function, as I understand it, takes exactly two arguments. http://www.w3.org/TR/xpath/#section-String-Functions I reproduced this with the python-lxml-3.3.6-1.fc21 package that Fedora 21 Linux provides, and on my MacBook with the py-lxml-3.4.1_0 MacPorts distribution. Please advise: is this a legitimate bug in lxml? If so, I'll file in the bug tracker. -- "The first step in confirming there is a bug in someone else's work is confirming there are no bugs in your own." -- Alexander J. Vincent, June 30, 2001

Hi,
However, I did find a case where an invalid expression doesn't throw: [ajvincent@localhost ~]$ python Python 2.7.8 (default, Nov 10 2014, 08:19:18) [GCC 4.9.2 20141101 (Red Hat 4.9.2-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree etree.XPath("//b[contains(.)]") //b[contains(.)]
The contains function, as I understand it, takes exactly two arguments. http://www.w3.org/TR/xpath/#section-String-Functions
I reproduced this with the python-lxml-3.3.6-1.fc21 package that Fedora 21 Linux provides, and on my MacBook with the py-lxml-3.4.1_0 MacPorts distribution.
Please advise: is this a legitimate bug in lxml? If so, I'll file in the bug tracker.
An error will only manifest when actually evaluating the XPath:
etree.XPath("//root[substring()]")(etree.fromstring("<root/>")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xpath.pxi", line 445, in lxml.etree.XPath.__call__ (src/lxml/lxml.etree.c:153576) File "xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:150914) File "xpath.pxi", line 212, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:150713) lxml.etree.XPathEvalError: Invalid number of arguments etree.XPath("//root[contains()]")(etree.fromstring("<root/>")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xpath.pxi", line 445, in lxml.etree.XPath.__call__ (src/lxml/lxml.etree.c:153576) File "xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:150914) File "xpath.pxi", line 212, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:150713) lxml.etree.XPathEvalError: Invalid number of arguments
Note that you won't run into the error when the XPath predicate's not applied i.e. contains() is not even called:
etree.XPath("//b[contains()]")(etree.fromstring("<root/>")) []
Same behaviour with e.g. a non-existent function:
etree.XPath("//b[foobar()]")(etree.fromstring("<root/>")) []
So calling with the wrong arg number is an XPath runtime, not an XPath compile time error. Indeed you wouldn't know if there's an error or not at compile time:
etree.XPath("//root[foobar(.)]")(etree.fromstring("<root/>")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xpath.pxi", line 445, in lxml.etree.XPath.__call__ (src/lxml/lxml.etree.c:153576) File "xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:150914) File "xpath.pxi", line 212, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:150713) lxml.etree.XPathEvalError: Unregistered function def foobar(context, a1): ... return a1 ... ns = etree.FunctionNamespace(None) ns['foobar'] = foobar etree.XPath("//root[foobar(.)]")(etree.fromstring("<root/>")) [<Element root at 0x7fe42b7e0830>]
(Although it seems not to be possible to override XPath built-in functions this way...) Compare with Python's behaviour:
def f(x, y, z): ... print x, y, z ... compile('f(a)', "", "eval") <code object <module> at 0x7f5cc48bed30, file "", line 1> eval(compile('f(a)', "", "eval")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "", line 1, in <module>
NameError: name 'a' is not defined
a = 3 eval(compile('f(a)', "", "eval")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "", line 1, in <module>
TypeError: f() takes exactly 3 arguments (1 given)
Or just your "plain" Python code without compile/eval of strings:
def call_f(): ... return f(1) ... call_f() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in call_f TypeError: f() takes exactly 3 arguments (1 given) def other_f(x): ... print x ... f = other_f # re-bind name f in call_f's (global) scope to other_f function object call_f() 1
Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
participants (2)
-
Alex Vincent
-
Holger Joukl