lxml.etree.iterparse schema validation does not validate

Hi,
I'm not sure whether I use the iterparse interface correctly but I did a look inside the tests for it (src/lxml/tests/test_xmlschema.py) and
found out that the validation results of iterparse provided with a schema and schema.validate or schema.assertValid using a parsed tree (via etree.parse) differ.
The probably necessary information to reproduce: (run on an xubuntu 18.04 with or without manually updated lxml via pip install -U lxml)
Python 2.7.17 (default, Jul 20 2020, 15:37:01)
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import sys
from lxml import etree
print("%-20s: %s" % ('Python', sys.version_info))
Python : sys.version_info(major=2, minor=7, micro=17, releaselevel='final', serial=0)
print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (4, 5, 2, 0)
print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 9, 10)
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 9, 10)
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 34)
print("%-20s: %s" % ('libxslt compiled',
etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 34)
The minimal script one could use to reproduce (based on your test test_xmlschema_iterparse_fail in src/lxml/tests/test_xmlschema.py):
from lxml import etree
from io import BytesIO
import StringIO
schema = etree.parse(StringIO.StringIO('''
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema%22%3E
<xsd:element name="a" type="AType"/>
<xsd:complexType name="AType">
<xsd:element name="b" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
'''))
schema = etree.XMLSchema(schema)
raw_tree = BytesIO('<a><c></c></a>')
etree.iterparse(raw_tree, schema=schema)
tree = etree.parse(raw_tree)
if not schema.validate(tree):
print('Error: Different validation results:')
schema.assertValid(tree)
After some more research I found out that also the test_xmlschema_iterparse_fail seems to be broken.
Using the one below which I corrected using with self.assertRaises instead to ensure not raised but expected exceptions also lead to a failing test
you could reproduce the bug (if you agree that it is one) also within your test framework as there is no exception raised by iterparse invocation:
def test_xmlschema_iterparse_fail(self):
schema = self.parse('''
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema%22%3E
<xsd:element name="a" type="AType"/>
<xsd:complexType name="AType">
<xsd:element name="b" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
''')
schema = etree.XMLSchema(schema)
with self.assertRaises(etree.XMLSyntaxError):
etree.iterparse(BytesIO('<a><c></c></a>'), schema=schema)
make test invocation result is then something like:
FAIL: test_xmlschema_iterparse_fail (lxml.tests.test_xmlschema.ETreeXMLSchemaTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/unittest/case.py", line 329, in run
testMethod()
File "/home/user/git/lxml/src/lxml/tests/test_xmlschema.py", line 289, in test_xmlschema_iterparse_fail
etree.iterparse(BytesIO('<a><c></c></a>'), schema=schema)
File "/usr/lib/python2.7/unittest/case.py", line 116, in __exit__
"{0} not raised".format(exc_name))
AssertionError: XMLSyntaxError not raised
Could you advise me what to do next? Should I issue a bug for that? Or is my expectation wrong here that iterparse should raise an exception in case of a schema violation?
Best regards,
Kai

Kai Hillmann schrieb am 28.07.20 um 10:58:
I'm not sure whether I use the iterparse interface correctly but I did a look inside the tests for it (src/lxml/tests/test_xmlschema.py) and found out that the validation results of iterparse provided with a schema and schema.validate or schema.assertValid using a parsed tree (via etree.parse) differ.
schema = etree.XMLSchema(schema) raw_tree = BytesIO('<a><c></c></a>')
etree.iterparse(raw_tree, schema=schema)
tree = etree.parse(raw_tree)
if not schema.validate(tree):
print('Error: Different validation results:') schema.assertValid(tree)
Note that etree.iterparse() returns an iterator that parses incrementally. It does not parse the whole input yet all by itself. In order to trigger the parsing (and thus, parsing/validation errors), you have to iterate over it. This the call to list() in the test function.
Stefan

Am 28.07.20 um 13:05 schrieb Stefan Behnel:
Kai Hillmann schrieb am 28.07.20 um 10:58:
I'm not sure whether I use the iterparse interface correctly but I did a look inside the tests for it (src/lxml/tests/test_xmlschema.py) and found out that the validation results of iterparse provided with a schema and schema.validate or schema.assertValid using a parsed tree (via etree.parse) differ.
schema = etree.XMLSchema(schema) raw_tree = BytesIO('<a><c></c></a>')
etree.iterparse(raw_tree, schema=schema)
tree = etree.parse(raw_tree)
if not schema.validate(tree):
print('Error: Different validation results:') schema.assertValid(tree)
Note that etree.iterparse() returns an iterator that parses incrementally. It does not parse the whole input yet all by itself. In order to trigger the parsing (and thus, parsing/validation errors), you have to iterate over it. This the call to list() in the test function.
Thank you very much for the quick response, I overlooked that - you are right, when iterating over the iterparse iterator it is working as expected.
Kai
Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
participants (2)
-
Kai Hillmann
-
Stefan Behnel