lxml.etree.XPathEvalError: Invalid expression for correct XPath expression on large XML file

Hi, I am using Python/lxml to process large (~300MB) XML files and extract information with XPath. I stumbled upon a strange error that I cannot make any sense of: All XPath expressions using a "where" clause (square brackets) fail with the error message "lxml.etree.XPathEvalError: Invalid expression" (see stack traces below). Happens with different versions, 32-bit and 64-bit, and different OSs. I cannot reproduce this behaviour with small XML files, and I could not find any information about this. Has anyone experienced something similar? Can anybody determine some useful information from the stack trace? Many thanks, Dennis Tested environments: Linux =====
xml = etree.parse('stammdaten.xml') xml.xpath('//foo[@id]') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2115, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:57654) File "xpath.pxi", line 370, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:146564) File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962) File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144817) lxml.etree.XPathEvalError: Invalid expression
print("%-20s: %s" % ('Python', sys.version_info)) Python : sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0) print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION)) lxml.etree : (3, 3, 3, 0) print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION)) libxml used : (2, 9, 1) print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION)) libxml compiled : (2, 9, 1) print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION)) libxslt used : (1, 1, 28) print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION)) libxslt compiled : (1, 1, 28)
Windows =======
xml = etree.parse('stammdaten.xml') xml.xpath('//*[@id]') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "src/lxml/lxml.etree.pyx", line 2272, in lxml.etree._ElementTree.xpath (src\lxml\lxml.etree.c:70786) File "src/lxml/xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__ (src\lxml\lxml.etree.c:179148) File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src\lxml\lxml.etree.c:177421) lxml.etree.XPathEvalError: Invalid expression
print("%-20s: %s" % ('Python', sys.version_info)) Python : sys.version_info(major=3, minor=4, micro=0, releaselevel='final', serial=0) print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION)) lxml.etree : (3, 5, 0, 0) print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION)) libxml used : (2, 9, 2) print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION)) libxml compiled : (2, 9, 2) print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION)) libxslt used : (1, 1, 28) print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION)) libxslt compiled : (1, 1, 28)

Hi Dennis,
I am using Python/lxml to process large (~300MB) XML files and extract information with XPath. I stumbled upon a strange error that I cannot make any sense of: All XPath expressions using a "where" clause (square brackets) fail with the error message "lxml.etree.XPathEvalError: Invalid expression" (see stack traces below). Happens with different versions, 32-bit and 64-bit, and different OSs.
I cannot reproduce this behaviour with small XML files, and I could not find any information about this. Has anyone experienced something similar? Can anybody determine some useful information from the stack trace?
Many thanks, Dennis
Tested environments:
Linux =====
xml = etree.parse('stammdaten.xml') xml.xpath('//foo[@id]') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2115, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:57654) File "xpath.pxi", line 370, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:146564) File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962) File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/ lxml.etree.c:144817) lxml.etree.XPathEvalError: Invalid expression
I'm afraid you probably won't get much help unless you can provide some minimal example to reproduce the error. I suspect the "small" differ from the "large" XML files in a way that your XPath predicates (the square brackets parts) won't even get considered and thus you don't run into a problem there. E.g. for
etree.XPath('//foo[bar()]')
you won't see a problem with the predicate unless you run on an XML that actually *has* foo elements:
etree.XPath('//foo[bar()]')(etree.fromstring('<root/>') ... ) [] etree.XPath('//foo[bar()]')(etree.fromstring('<root><foo/></root>')) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xpath.pxi", line 445, in lxml.etree.XPath.__call__ (src/lxml/lxml.etree.c:153576) File "xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:150914) File "xpath.pxi", line 212, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:150713) lxml.etree.XPathEvalError: Unregistered function
Best regards Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
participants (2)
-
Dennis Walter
-
Holger Joukl