cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

I am able to reproduce an erroneous behavior with python 2.7 and 3.4l + xml 3.6.0. Processing is correct when using xsltproc directly: import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error" The stylesheet is: https://github.com/rhoerbe/saml_schematron/blob/master/rules/schtron/rule04E... and has been generated from this schematron: https://github.com/rhoerbe/saml_schematron/blob/master/rules/schtron/rule04E... The code works with other (simple?) stylesheets. Are there any restrictions in lxml that prevent the processing of ISO-schematron stylesheets? Or is this a bug that should be filed? Regards, Rainer

Hi,
Hm, this sample code doesn't apply the stylesheet to an input document?
Can't reproduce: Python 2.7.5 (default, Aug 12 2013, 15:01:02) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
print schematron._validator.error_log.last_error None
Of course, neither xsl transformation nor schematron validation are actually applied to any input document here. So an empty error log is probably expectable. (note that I needed to add the <iso:schema> root element to rule04E.sch and fix the patter id attribute to make it a valid stand-alone schematron schema, processable with isoschematron.Schematron, i.e. <iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron"> <iso:ns uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <iso:ns uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <iso:pattern id="Rule4"> <iso:rule context="//md:IDPSSODescriptor"> <iso:assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate" > Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> <iso:rule context="//md:SPSSODescriptor"> <iso:assert test="descendant::ds:X509Data/ds:X509Certificate"> Error (04): Each SPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> </iso:pattern> </iso:schema> ) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hi Holger
Hm, this sample code doesn't apply the stylesheet to an input document?
I left out the further code because I think that the error is in the XSLT object, but I may be wrong. Here it is: df = 'testdata/idp_incomplete.xml‘ md_dom = ET.parse(df) out_dom = transform(md_dom) print(ET.tostring(out_dom, xml_declaration=False, encoding='utf-8'))
Can't reproduce:
Actually I have a docker container that is demonstrating the problem: Steps (having docker installed): curl -O https://github.com/rhoerbe/saml_schematron.git cd saml_schematron/docker ./build.sh docker run -it --name xslttest r2h2/samlschtron4 bash Then, in the container: cd /opt/saml_schematron/ python3.4 # exec the above code, it will output „None“ exit xsltproc rules/schtron/rule04E.xsl testdata/idp_incomplete.xml # show expected behavior Thanks, your help is appreciated. - Rainer

No, the actual transform leads to the error, see below.
Corporate env and no quick way to fire up a docker install here. Anyway, I can confirm xsltproc & lxml indeed behave differently on your rule04E.xsl: $ xsltproc rule04E.xsl idp_incomplete.xml Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) $ python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: unknown error <string>:0:0:ERROR:XSLT:ERR_OK: unknown error Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run: python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E_oxygen.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) <string>:0:0:ERROR:XSLT:ERR_OK: Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor [1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor [@use='signing' or not(@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) (oxygen reindent does what it says and basically pretty-printing, so it may well remove whitespace & whatnot) No idea what makes etree.XSLT() choke on the original xsl file, in contrast to xsltproc. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Sorry, all things looking like "internet storage" are generally blocked here for me. Maybe you could rather download oxygen and try this out yourself, it comes fully functional with an evaluation period. Btw have you tried lxml's isoschematron functionality directly with the original stand-alone schematron schema or an xsd with included schematron rules as input? Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Resolved. I failed to use the stylesheet’s error_log attribute. I should have notes, as xsltproc is writing to stderr in a similar manner. Thank you for providing me with the minimized example, this did help me to find this out. The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath. - Rainer

Rainer Hoerbe schrieb am 05.04.2016 um 18:54:
The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath.
Note that the result is not just a string but a log entry object. You can ask it for its details programmatically. The (admittedly ugly) fact that it says both "ERR_OK" and "unknown error" is because the error reporting interface in libxslt is, well, somewhat suboptimal. Stefan

I was now able to reproduce with several unit tests that reformatting the style sheet will get rid of the bug where lxml is reporting "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ when it should not. Creating style sheets with Xerces/J instead of xsltproc did not help either. I reported this as bug #1567633. - Rainer

Hi,
I still don't quite follow. While there seems to be different behaviour between xsltproc and etree.XSLT: How do you create the validating xsls from the schematron schema? Looks like this works just fine if I use lxml.isoschematron directly on the schematron schema:
--> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <svrl:ns-prefix-in-attribute-values uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:rpi" prefix="rpi"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:ui" prefix="mdui"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:algsupport" prefix="alg"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:2.0:assertion" prefix="saml"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:profiles:SSO:idp-discovery-protocol" prefix="idpdisc"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:attribute" prefix="mdattr"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:profiles:SSO:request-init" prefix="init"/> <svrl:active-pattern id="Rule04" name="Rule04"/> <svrl:fired-rule context="//md:IDPSSODescriptor"/> <svrl:failed-assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate" location="/*[local-name ()='EntityDescriptor' and namespace-uri ()='urn:oasis:names:tc:SAML:2.0:metadata']/*[local-name ()='IDPSSODescriptor' and namespace-uri ()='urn:oasis:names:tc:SAML:2.0:metadata']"> <svrl:text> Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </svrl:text> </svrl:failed-assert> </svrl:schematron-output> Internally, lxml.isoschematron generates the validation stylesheets base on the isoschematron skeleton reference implementation (the XSLs are included in the lxml.isoschematron package). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hi Holger, Thank for the guidance towards using lxml.isoschematron - I was missing this. However, one new problem appeared: "ValueError: Unicode strings with encoding declaration are not supported“. While I can easily work around that with the schematron files which are part of the project, the files to be validated are both UTF8 and various 8-bit flavors (win1252, ..). How to deal with this in python 3? Do I have to process the files upfront (extract encoding, convert to to UTF-8 and remove encoding? the doc on http://lxml.de/parsing.html#python-unicode-strings reads like a python 2 hack. - Rainer

Hi,
Why would you need to read from unicode strings? Let lxml parse the files: The XML parser is responsible for doing the decoding and will do this just fine unless the input files are severely broken. Or read from byte strings, if for some reason you really need to read the file contents yourself. lxml should happily accept byte strings with encoding declarations. In other words, don't decode the read binary data to unicode before passing to lxml. If you absolutely cannot do this - but I'd bet you can - you could still manually remove the encoding declaration from the start of the unicode string and then feed into fromstring(). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hi,
Hm, this sample code doesn't apply the stylesheet to an input document?
Can't reproduce: Python 2.7.5 (default, Aug 12 2013, 15:01:02) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
print schematron._validator.error_log.last_error None
Of course, neither xsl transformation nor schematron validation are actually applied to any input document here. So an empty error log is probably expectable. (note that I needed to add the <iso:schema> root element to rule04E.sch and fix the patter id attribute to make it a valid stand-alone schematron schema, processable with isoschematron.Schematron, i.e. <iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron"> <iso:ns uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <iso:ns uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <iso:pattern id="Rule4"> <iso:rule context="//md:IDPSSODescriptor"> <iso:assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate" > Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> <iso:rule context="//md:SPSSODescriptor"> <iso:assert test="descendant::ds:X509Data/ds:X509Certificate"> Error (04): Each SPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </iso:assert> </iso:rule> </iso:pattern> </iso:schema> ) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hi Holger
Hm, this sample code doesn't apply the stylesheet to an input document?
I left out the further code because I think that the error is in the XSLT object, but I may be wrong. Here it is: df = 'testdata/idp_incomplete.xml‘ md_dom = ET.parse(df) out_dom = transform(md_dom) print(ET.tostring(out_dom, xml_declaration=False, encoding='utf-8'))
Can't reproduce:
Actually I have a docker container that is demonstrating the problem: Steps (having docker installed): curl -O https://github.com/rhoerbe/saml_schematron.git cd saml_schematron/docker ./build.sh docker run -it --name xslttest r2h2/samlschtron4 bash Then, in the container: cd /opt/saml_schematron/ python3.4 # exec the above code, it will output „None“ exit xsltproc rules/schtron/rule04E.xsl testdata/idp_incomplete.xml # show expected behavior Thanks, your help is appreciated. - Rainer

No, the actual transform leads to the error, see below.
Corporate env and no quick way to fire up a docker install here. Anyway, I can confirm xsltproc & lxml indeed behave differently on your rule04E.xsl: $ xsltproc rule04E.xsl idp_incomplete.xml Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) $ python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: unknown error <string>:0:0:ERROR:XSLT:ERR_OK: unknown error Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run: python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E_oxygen.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) <string>:0:0:ERROR:XSLT:ERR_OK: Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor [1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor [@use='signing' or not(@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) (oxygen reindent does what it says and basically pretty-printing, so it may well remove whitespace & whatnot) No idea what makes etree.XSLT() choke on the original xsl file, in contrast to xsltproc. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Sorry, all things looking like "internet storage" are generally blocked here for me. Maybe you could rather download oxygen and try this out yourself, it comes fully functional with an evaluation period. Btw have you tried lxml's isoschematron functionality directly with the original stand-alone schematron schema or an xsd with included schematron rules as input? Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Resolved. I failed to use the stylesheet’s error_log attribute. I should have notes, as xsltproc is writing to stderr in a similar manner. Thank you for providing me with the minimized example, this did help me to find this out. The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath. - Rainer

Rainer Hoerbe schrieb am 05.04.2016 um 18:54:
The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath.
Note that the result is not just a string but a log entry object. You can ask it for its details programmatically. The (admittedly ugly) fact that it says both "ERR_OK" and "unknown error" is because the error reporting interface in libxslt is, well, somewhat suboptimal. Stefan

I was now able to reproduce with several unit tests that reformatting the style sheet will get rid of the bug where lxml is reporting "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ when it should not. Creating style sheets with Xerces/J instead of xsltproc did not help either. I reported this as bug #1567633. - Rainer

Hi,
I still don't quite follow. While there seems to be different behaviour between xsltproc and etree.XSLT: How do you create the validating xsls from the schematron schema? Looks like this works just fine if I use lxml.isoschematron directly on the schematron schema:
--> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:2.0:metadata" prefix="md"/> <svrl:ns-prefix-in-attribute-values uri="http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:rpi" prefix="rpi"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:ui" prefix="mdui"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:algsupport" prefix="alg"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:2.0:assertion" prefix="saml"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:profiles:SSO:idp-discovery-protocol" prefix="idpdisc"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:metadata:attribute" prefix="mdattr"/> <svrl:ns-prefix-in-attribute-values uri="urn:oasis:names:tc:SAML:profiles:SSO:request-init" prefix="init"/> <svrl:active-pattern id="Rule04" name="Rule04"/> <svrl:fired-rule context="//md:IDPSSODescriptor"/> <svrl:failed-assert test="md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate" location="/*[local-name ()='EntityDescriptor' and namespace-uri ()='urn:oasis:names:tc:SAML:2.0:metadata']/*[local-name ()='IDPSSODescriptor' and namespace-uri ()='urn:oasis:names:tc:SAML:2.0:metadata']"> <svrl:text> Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) </svrl:text> </svrl:failed-assert> </svrl:schematron-output> Internally, lxml.isoschematron generates the validation stylesheets base on the isoschematron skeleton reference implementation (the XSLs are included in the lxml.isoschematron package). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hi Holger, Thank for the guidance towards using lxml.isoschematron - I was missing this. However, one new problem appeared: "ValueError: Unicode strings with encoding declaration are not supported“. While I can easily work around that with the schematron files which are part of the project, the files to be validated are both UTF8 and various 8-bit flavors (win1252, ..). How to deal with this in python 3? Do I have to process the files upfront (extract encoding, convert to to UTF-8 and remove encoding? the doc on http://lxml.de/parsing.html#python-unicode-strings reads like a python 2 hack. - Rainer

Hi,
Why would you need to read from unicode strings? Let lxml parse the files: The XML parser is responsible for doing the decoding and will do this just fine unless the input files are severely broken. Or read from byte strings, if for some reason you really need to read the file contents yourself. lxml should happily accept byte strings with encoding declarations. In other words, don't decode the read binary data to unicode before passing to lxml. If you absolutely cannot do this - but I'd bet you can - you could still manually remove the encoding declaration from the start of the unicode string and then feed into fromstring(). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
participants (3)
-
Holger Joukl
-
Rainer Hoerbe
-
Stefan Behnel