cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"
I am able to reproduce an erroneous behavior with python 2.7 and 3.4l + xml 3.6.0. Processing is correct when using xsltproc directly: import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error" The stylesheet is: https://github.com/rhoerbe/saml_schematron/blob/master/rules/schtron/rule04E... and has been generated from this schematron: https://github.com/rhoerbe/saml_schematron/blob/master/rules/schtron/rule04E... The code works with other (simple?) stylesheets. Are there any restrictions in lxml that prevent the processing of ISO-schematron stylesheets? Or is this a bug that should be filed? Regards, Rainer
Hi,
I am able to reproduce an erroneous behavior with python 2.7 and 3. 4l + xml 3.6.0. Processing is correct when using xsltproc directly:
import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>: 0:0:ERROR:XSLT:ERR_OK: unknown error"
Hm, this sample code doesn't apply the stylesheet to an input document?
The stylesheet is: https://github.com/rhoerbe/saml_schematron/blob/master/rules/ schtron/rule04E.xsl and has been generated from this schematron: https://github.com/rhoerbe/saml_schematron/blob/master/rules/ schtron/rule04E.sch
Can't reproduce: Python 2.7.5 (default, Aug 12 2013, 15:01:02) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree print etree.__version__ 3.6.0
xsl_file = "rule04E.xsl" xsl_doc = etree.parse(xsl_file) transform = etree.XSLT(xsl_doc) print transform.error_log
print transform.error_log.last_error None
from lxml import isoschematron sch_file = "rule04E.sch" sch_doc = etree.parse(sch_file) schematron = isoschematron.Schematron(sch_doc, store_xslt=True) print type(schematron.validator_xslt) <type 'lxml.etree._XSLTResultTree'> print schematron._validator.error_log
print schematron._validator.error_log.last_error None
Of course, neither xsl transformation nor schematron validation are actually applied to any input document here. So an empty error log is probably expectable. (note that I needed to add the <iso:schema> root element to rule04E.sch and fix the patter id attribute to make it a valid stand-alone schematron schema, processable with isoschematron.Schematron, i.e. <iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron"> <iso:ns uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <iso:ns uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <iso:pattern id="Rule4"> <iso:rule context="//md:IDPSSODescriptor"> <iso:assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate" > Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> <iso:rule context="//md:SPSSODescriptor"> <iso:assert test="descendant::ds:X509Data/ds:X509Certificate"> Error (04): Each SPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> </iso:pattern> </iso:schema> ) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Hi Holger
import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>: 0:0:ERROR:XSLT:ERR_OK: unknown error“
Hm, this sample code doesn't apply the stylesheet to an input document?
I left out the further code because I think that the error is in the XSLT object, but I may be wrong. Here it is: df = 'testdata/idp_incomplete.xml‘ md_dom = ET.parse(df) out_dom = transform(md_dom) print(ET.tostring(out_dom, xml_declaration=False, encoding='utf-8'))
Can't reproduce:
Actually I have a docker container that is demonstrating the problem: Steps (having docker installed): curl -O https://github.com/rhoerbe/saml_schematron.git cd saml_schematron/docker ./build.sh docker run -it --name xslttest r2h2/samlschtron4 bash Then, in the container: cd /opt/saml_schematron/ python3.4 # exec the above code, it will output „None“ exit xsltproc rules/schtron/rule04E.xsl testdata/idp_incomplete.xml # show expected behavior Thanks, your help is appreciated. - Rainer
Python 2.7.5 (default, Aug 12 2013, 15:01:02) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree print etree.__version__ 3.6.0
xsl_file = "rule04E.xsl" xsl_doc = etree.parse(xsl_file) transform = etree.XSLT(xsl_doc) print transform.error_log
print transform.error_log.last_error None
from lxml import isoschematron sch_file = "rule04E.sch" sch_doc = etree.parse(sch_file) schematron = isoschematron.Schematron(sch_doc, store_xslt=True) print type(schematron.validator_xslt) <type 'lxml.etree._XSLTResultTree'> print schematron._validator.error_log
print schematron._validator.error_log.last_error None
Of course, neither xsl transformation nor schematron validation are actually applied to any input document here. So an empty error log is probably expectable.
(note that I needed to add the <iso:schema> root element to rule04E.sch and fix the patter id attribute to make it a valid stand-alone schematron schema, processable with isoschematron.Schematron, i.e.
<iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron"> <iso:ns uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <iso:ns uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <iso:pattern id="Rule4"> <iso:rule context="//md:IDPSSODescriptor"> <iso:assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate"
Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule>
<iso:rule context="//md:SPSSODescriptor"> <iso:assert test="descendant::ds:X509Data/ds:X509Certificate"> Error (04): Each SPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> </iso:pattern> </iso:schema> )
Holger
Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
I left out the further code because I think that the error is in the XSLT object, but I may be wrong. Here it is:
df = 'testdata/idp_incomplete.xml‘ md_dom = ET.parse(df) out_dom = transform(md_dom) print(ET.tostring(out_dom, xml_declaration=False, encoding='utf-8'))
No, the actual transform leads to the error, see below.
python3.4 # exec the above code, it will output „None“ exit xsltproc rules/schtron/rule04E.xsl testdata/idp_incomplete.xml # show expected behavior
Corporate env and no quick way to fire up a docker install here. Anyway, I can confirm xsltproc & lxml indeed behave differently on your rule04E.xsl: $ xsltproc rule04E.xsl idp_incomplete.xml Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) $ python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: unknown error <string>:0:0:ERROR:XSLT:ERR_OK: unknown error Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run: python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E_oxygen.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) <string>:0:0:ERROR:XSLT:ERR_OK: Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor [1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor [@use='signing' or not(@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) (oxygen reindent does what it says and basically pretty-printing, so it may well remove whitespace & whatnot) No idea what makes etree.XSLT() choke on the original xsl file, in contrast to xsltproc. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Am 01.04.2016 um 17:40 schrieb Holger Joukl <Holger.Joukl@LBBW.de>:
Anyway, I can confirm xsltproc & lxml indeed behave differently on your rule04E.xsl:
I have dozens of these where lxml fails :-(
<string>:0:0:ERROR:XSLT:ERR_OK: unknown error
Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run:
Please could you provide me with the edited file (pastebin.com etc.)? Thanks, Rainer
Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run:
Please could you provide me with the edited file (pastebin.com etc.)?
Sorry, all things looking like "internet storage" are generally blocked here for me. Maybe you could rather download oxygen and try this out yourself, it comes fully functional with an evaluation period. Btw have you tried lxml's isoschematron functionality directly with the original stand-alone schematron schema or an xsd with included schematron rules as input? Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Resolved. I failed to use the stylesheet’s error_log attribute. I should have notes, as xsltproc is writing to stderr in a similar manner. Thank you for providing me with the minimized example, this did help me to find this out. The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath. - Rainer
Am 04.04.2016 um 08:19 schrieb Holger Joukl <Holger.Joukl@LBBW.de>:
Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run:
Please could you provide me with the edited file (pastebin.com etc.)?
Sorry, all things looking like "internet storage" are generally blocked here for me. Maybe you could rather download oxygen and try this out yourself, it comes fully functional with an evaluation period.
Btw have you tried lxml's isoschematron functionality directly with the original stand-alone schematron schema or an xsd with included schematron rules as input?
Holger
Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Rainer Hoerbe schrieb am 05.04.2016 um 18:54:
The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath.
Note that the result is not just a string but a log entry object. You can ask it for its details programmatically. The (admittedly ugly) fact that it says both "ERR_OK" and "unknown error" is because the error reporting interface in libxslt is, well, somewhat suboptimal. Stefan
I was now able to reproduce with several unit tests that reformatting the style sheet will get rid of the bug where lxml is reporting "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ when it should not. Creating style sheets with Xerces/J instead of xsltproc did not help either. I reported this as bug #1567633. - Rainer
Am 05.04.2016 um 22:02 schrieb Stefan Behnel <stefan_ml@behnel.de>:
Rainer Hoerbe schrieb am 05.04.2016 um 18:54:
The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath.
Note that the result is not just a string but a log entry object. You can ask it for its details programmatically.
The (admittedly ugly) fact that it says both "ERR_OK" and "unknown error" is because the error reporting interface in libxslt is, well, somewhat suboptimal.
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Hi,
I was now able to reproduce with several unit tests that reformatting the style sheet will get rid of the bug where lxml is reporting "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ when it should not. Creating style sheets with Xerces/J instead of xsltproc did not help either.
I reported this as bug #1567633.
- Rainer
I still don't quite follow. While there seems to be different behaviour between xsltproc and etree.XSLT: How do you create the validating xsls from the schematron schema? Looks like this works just fine if I use lxml.isoschematron directly on the schematron schema:
from lxml import etree print etree.__version__3.6.0 print etree.LIBXML_VERSION, etree.LIBXML_COMPILED_VERSION (2, 9, 1) (2, 9, 1) print etree.LIBXSLT_VERSION, etree.LIBXSLT_COMPILED_VERSION (1, 1, 28) (1, 1, 28) schematron = isoschematron.Schematron(file="/tmp/rule04E_complete.sch", store_report=True) doc = etree.parse("/tmp/rule4_test1_idp_missing_key.xml") schematron.validate(doc) False print schematron.validation_report <?xml version="1.0" standalone="yes"?> <svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:rpi="urn:oasis:names:tc:SAML:metadata:rpi" xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui" xmlns:alg="urn:oasis:names:tc:SAML:metadata:algsupport" xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:idpdisc="urn:oasis:names:tc:SAML:profiles:SSO:idp-discovery-protocol" xmlns:mdattr="urn:oasis:names:tc:SAML:metadata:attribute" xmlns:init="urn:oasis:names:tc:SAML:profiles:SSO:request-init" title="" schemaVersion="ISO19757-3"> <!--
--> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <svrl:ns-prefix-in-attribute-values uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:rpi" prefix="rpi"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:ui" prefix="mdui"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:algsupport" prefix="alg"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:2.0:assertion" prefix="saml"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:profiles:SSO:idp-discovery-protocol" prefix="idpdisc"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:attribute" prefix="mdattr"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:profiles:SSO:request-init" prefix="init"/> <svrl:active-pattern id="Rule04" name="Rule04"/> <svrl:fired-rule context="//md:IDPSSODescriptor"/> <svrl:failed-assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate" location="/*[local-name ()='EntityDescriptor' and namespace-uri ()='urn:oasis:names:tc:SAML:2.0:metadata']/*[local-name ()='IDPSSODescriptor' and namespace-uri ()='urn:oasis:names:tc:SAML:2.0:metadata']"> <svrl:text> Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </svrl:text> </svrl:failed-assert> </svrl:schematron-output> Internally, lxml.isoschematron generates the validation stylesheets base on the isoschematron skeleton reference implementation (the XSLs are included in the lxml.isoschematron package). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Hi Holger, Thank for the guidance towards using lxml.isoschematron - I was missing this. However, one new problem appeared: "ValueError: Unicode strings with encoding declaration are not supported“. While I can easily work around that with the schematron files which are part of the project, the files to be validated are both UTF8 and various 8-bit flavors (win1252, ..). How to deal with this in python 3? Do I have to process the files upfront (extract encoding, convert to to UTF-8 and remove encoding? the doc on http://lxml.de/parsing.html#python-unicode-strings reads like a python 2 hack. - Rainer
Hi,
However, one new problem appeared: "ValueError: Unicode strings with encoding declaration are not supported“. While I can easily work around that with the schematron files which are part of the project, the files to be validated are both UTF8 and various 8-bit flavors (win1252, ..). How to deal with this in python 3? Do I have to process the files upfront (extract encoding, convert to to UTF-8 and remove encoding? the doc on http://lxml.de/parsing.html#python-unicode-strings reads like a python 2 hack.
Why would you need to read from unicode strings? Let lxml parse the files: The XML parser is responsible for doing the decoding and will do this just fine unless the input files are severely broken. Or read from byte strings, if for some reason you really need to read the file contents yourself. lxml should happily accept byte strings with encoding declarations. In other words, don't decode the read binary data to unicode before passing to lxml. If you absolutely cannot do this - but I'd bet you can - you could still manually remove the encoding declaration from the start of the unicode string and then feed into fromstring(). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Why would you need to read from unicode strings?
Let lxml parse the files: The XML parser is responsible for doing the decoding and will do this just fine unless the input files are severely broken.
Or read from byte strings, if for some reason you really need to read the file contents yourself. lxml should happily accept byte strings with encoding declarations. In other words, don't decode the read binary data to unicode before passing to lxml.
If you absolutely cannot do this - but I'd bet you can - you could still manually remove the encoding declaration from the start of the unicode string and then feed into fromstring().
Holger
Got it - I need to parse the file directly. Thanks, Rainer
participants (3)
-
Holger Joukl
-
Rainer Hoerbe
-
Stefan Behnel