Isoschematron.Schematron not working as expected
Hallo everyone, while working on a ISO Schematron validation routine, I noticed that reporting, even if triggered correctly, was not generating a False value. If check the validation_report, the report element is being triggered correctly but the final result is True and the error_log is empty. Finally I went back to a very simple example included in isoschematron.Schematron docstring:
from lxml import isoschematron schematron = isoschematron.Schematron(etree.XML(''' ... <schema xmlns="http://purl.oclc.org/dsdl/schematron" > ... <pattern id="id_only_attribute"> ... <title>id is the only permitted attribute name</title> ... <rule context="*"> ... <report test="@*[not(name()='id')]">Attribute ... <name path="@*[not(name()='id')]"/> is forbidden<name/> ... </report> ... </rule> ... </pattern> ... </schema> ... '''))
xml = etree.XML(''' ... <AAA name="aaa"> ... <BBB id="bbb"/> ... <CCC color="ccc"/> ... </AAA> ... ''')
schematron.validate(xml) 0
xml = etree.XML(''' ... <AAA id="aaa"> ... <BBB id="bbb"/> ... <CCC/> ... </AAA> ... ''')
schematron.validate(xml) 1
Now if I run the above code I always get True, even with the invalid xml input. Same situation: validation_report is correct, but return value is True and error_log is empty. I'm running python 3.4.1 with lxml 3.5dev0. Same result with lxml 3.4.2. Am I doing something horribly wrong without realizing or is there actually a bug here? Best Regards Pierpaolo Da Fieno
Hi Pierpaolo,
Finally I went back to a very simple example included in isoschematron.Schematron docstring:
from lxml import isoschematron schematron = isoschematron.Schematron(etree.XML(''' ... <schema xmlns="http://purl.oclc.org/dsdl/schematron" > ... <pattern id="id_only_attribute"> ... <title>id is the only permitted attribute name</title> ... <rule context="*"> ... <report test="@*[not(name()='id')]">Attribute ... <name path="@*[not(name()='id')]"/> is forbidden<name/> ... </report> ... </rule> ... </pattern> ... </schema> ... '''))
xml = etree.XML(''' ... <AAA name="aaa"> ... <BBB id="bbb"/> ... <CCC color="ccc"/> ... </AAA> ... ''')
schematron.validate(xml) 0
xml = etree.XML(''' ... <AAA id="aaa"> ... <BBB id="bbb"/> ... <CCC/> ... </AAA> ... ''')
schematron.validate(xml) 1
Now if I run the above code I always get True, even with the invalid xml input. Same situation: validation_report is correct, but return value is True and error_log is empty. I'm running python 3.4.1 with lxml 3.5dev0. Same result with lxml 3.4.2. Am I doing something horribly wrong without realizing or is there actually a bug here?
It's definitely a bug in the documentation, i.e. the docstring is wrong, probably since the dawn of lxml isoschematron times (and I wonder why this isn't noticed by doctest...). Anyway: The current implementation only counts failed asserts as errors, not reports that have been triggered. This XPath is used to look for error elements in the svrl result report: isoschematron/__init__.py: [...] # svrl result accessors svrl_validation_errors = _etree.XPath( '//svrl:failed-assert', namespaces={'svrl': SVRL_NS}) [...] I *think* the rationale was that reports should be usable additionally to the validation results, but I'd need to look a bit more into this. It's been a while. That said you can easily change behaviour by using a custom schematron validator class. Inheriting from class Schematron and overriding the _validation_errors class attribute in your subclass with an etree.XPath object that detects whatever you deem an error in the result validation report document should do the trick: class Schematron(_etree._Validator): [...] # etree.XPath object that determines input document validity when applied to # the svrl result report; must return a list of result elements (empty if # valid) _validation_errors = svrl_validation_errors There's a customization example in src/lxml/tests/test_isoschematron.py, too. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Hi Holger, thank you for the helpful answer. It works flawlessly. Still I think that a master switch in the Schematron constructor to let it fail on report would be useful. I updated the code, docstring and tests and sent a pull request to make it happen. Thanks again. Pierpaolo Il giorno gio 26 feb 2015 alle ore 08:59 Holger Joukl <Holger.Joukl@lbbw.de> ha scritto:
Hi Pierpaolo,
Finally I went back to a very simple example included in isoschematron.Schematron docstring:
from lxml import isoschematron schematron = isoschematron.Schematron(etree.XML(''' ... <schema xmlns="http://purl.oclc.org/dsdl/schematron" > ... <pattern id="id_only_attribute"> ... <title>id is the only permitted attribute name</title> ... <rule context="*"> ... <report test="@*[not(name()='id')]">Attribute ... <name path="@*[not(name()='id')]"/> is forbidden<name/> ... </report> ... </rule> ... </pattern> ... </schema> ... '''))
xml = etree.XML(''' ... <AAA name="aaa"> ... <BBB id="bbb"/> ... <CCC color="ccc"/> ... </AAA> ... ''')
schematron.validate(xml) 0
xml = etree.XML(''' ... <AAA id="aaa"> ... <BBB id="bbb"/> ... <CCC/> ... </AAA> ... ''')
schematron.validate(xml) 1
Now if I run the above code I always get True, even with the invalid xml input. Same situation: validation_report is correct, but return value is True and error_log is empty. I'm running python 3.4.1 with lxml 3.5dev0. Same result with lxml 3.4.2. Am I doing something horribly wrong without realizing or is there actually a bug here?
It's definitely a bug in the documentation, i.e. the docstring is wrong, probably since the dawn of lxml isoschematron times (and I wonder why this isn't noticed by doctest...).
Anyway: The current implementation only counts failed asserts as errors, not reports that have been triggered. This XPath is used to look for error elements in the svrl result report:
isoschematron/__init__.py: [...] # svrl result accessors svrl_validation_errors = _etree.XPath( '//svrl:failed-assert', namespaces={'svrl': SVRL_NS}) [...]
I *think* the rationale was that reports should be usable additionally to the validation results, but I'd need to look a bit more into this. It's been a while.
That said you can easily change behaviour by using a custom schematron validator class.
Inheriting from class Schematron and overriding the _validation_errors class attribute in your subclass with an etree.XPath object that detects whatever you deem an error in the result validation report document should do the trick:
class Schematron(_etree._Validator):
[...] # etree.XPath object that determines input document validity when applied to # the svrl result report; must return a list of result elements (empty if # valid) _validation_errors = svrl_validation_errors
There's a customization example in src/lxml/tests/test_isoschematron.py, too.
Holger
Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Hi,
thank you for the helpful answer. It works flawlessly. Still I think that a master switch in the Schematron constructor to let it fail on report would be useful. I updated the code, docstring and tests and sent a pull request to make it happen.
I vaguely remember that it really was a conscious decision to not count report results as errors. That said it might (?) be practical to allow changing behaviour without subclassing but I'd probably prefer to use the standard error-counting XPath as a default argument, rather than a switch. Which would then also solve potential other requirements as to what to deem an error. So essentially this boils down to the question if there is a nicer way or at least a choice other than customization-by-subclassing. I've seen Stefan already merged your pull request; but does the switch really play well with the current customization-by-subclassing? svrl_validation_errors_complete should imho then be added as a class attribute, otherwise a subclass can not provide customized "complete" (in lack of a better word) error validation. But it's really nice to see that anyone else is finally using isoschematron... :-) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Holger Joukl schrieb am 27.02.2015 um 08:46:
thank you for the helpful answer. It works flawlessly. Still I think that a master switch in the Schematron constructor to let it fail on report would be useful. I updated the code, docstring and tests and sent a pull request to make it happen.
I vaguely remember that it really was a conscious decision to not count report results as errors.
That said it might (?) be practical to allow changing behaviour without subclassing but I'd probably prefer to use the standard error-counting XPath as a default argument, rather than a switch.
I think a switch is ok here. It's a "strict" mode that means "if there's any report at all, I want validation to fail". Requiring to pass in an XPath for that means that people will have to take care to properly build or find the right expression. It's too simple a case for that overhead, and tighter error counting can still be achieved by subclassing if a user really needs that. Both options are not contradictory.
svrl_validation_errors_complete should imho then be added as a class attribute, otherwise a subclass can not provide customized "complete" (in lack of a better word) error validation.
I renamed it to "svrl_validation_errors_and_reports" because that's what it finds. And subclasses that provide their own selection can simply remove the "fail_on_report" option from their constructor and pass False to their superclass. Stefan
participants (3)
-
Holger Joukl
-
Pierpaolo Da Fieno
-
Stefan Behnel