Mailman 3 cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error" - lxml - The Python XML Toolkit

cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

Rainer Hoerbe

31 Mar 2016 31 Mar '16

4:42 p.m.

I am able to reproduce an erroneous behavior with python 2.7 and 3.4l + xml 3.6.0. Processing is correct when using xsltproc directly: import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error" The stylesheet is: https://github.com/rhoerbe/saml_schematron/blob/master/rules/schtron/rule04E... and has been generated from this schematron: https://github.com/rhoerbe/saml_schematron/blob/master/rules/schtron/rule04E... The code works with other (simple?) stylesheets. Are there any restrictions in lxml that prevent the processing of ISO-schematron stylesheets? Or is this a bug that should be filed? Regards, Rainer

Show replies by date

Holger Joukl

1 Apr 1 Apr

4:24 a.m.

New subject: cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

Hi,

...

I am able to reproduce an erroneous behavior with python 2.7 and 3. 4l + xml 3.6.0. Processing is correct when using xsltproc directly:

import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>: 0:0:ERROR:XSLT:ERR_OK: unknown error"

Hm, this sample code doesn't apply the stylesheet to an input document?

...

The stylesheet is: https://github.com/rhoerbe/saml_schematron/blob/master/rules/ schtron/rule04E.xsl and has been generated from this schematron: https://github.com/rhoerbe/saml_schematron/blob/master/rules/ schtron/rule04E.sch

Can't reproduce: Python 2.7.5 (default, Aug 12 2013, 15:01:02) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...

...
...
from lxml import etree print etree.__version__ 3.6.0

xsl_file = "rule04E.xsl" xsl_doc = etree.parse(xsl_file) transform = etree.XSLT(xsl_doc) print transform.error_log

...

...
...
print transform.error_log.last_error None

from lxml import isoschematron sch_file = "rule04E.sch" sch_doc = etree.parse(sch_file) schematron = isoschematron.Schematron(sch_doc, store_xslt=True) print type(schematron.validator_xslt) print schematron._validator.error_log

...

...
...
print schematron._validator.error_log.last_error None

Of course, neither xsl transformation nor schematron validation are actually applied to any input document here. So an empty error log is probably expectable. (note that I needed to add the iso:schema root element to rule04E.sch and fix the patter id attribute to make it a valid stand-alone schematron schema, processable with isoschematron.Schematron, i.e. http://purl.oclc.org/dsdl/schematron"> http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) Error (04): Each SPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) ) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Rainer Hoerbe

8:35 a.m.

Hi Holger

...

...
import lxml.etree as ET sf = 'rules/schtron/rule04E.xsl' xslt = ET.fromstring(open(sf).read()) transform = ET.XSLT(xslt) # at this point transform.error_log.lasterror contains "<string>: 0:0:ERROR:XSLT:ERR_OK: unknown error“

...

Hm, this sample code doesn't apply the stylesheet to an input document?

I left out the further code because I think that the error is in the XSLT object, but I may be wrong. Here it is: df = 'testdata/idp_incomplete.xml‘ md_dom = ET.parse(df) out_dom = transform(md_dom) print(ET.tostring(out_dom, xml_declaration=False, encoding='utf-8'))

...

Can't reproduce:

Actually I have a docker container that is demonstrating the problem: Steps (having docker installed): curl -O https://github.com/rhoerbe/saml_schematron.git cd saml_schematron/docker ./build.sh docker run -it --name xslttest r2h2/samlschtron4 bash Then, in the container: cd /opt/saml_schematron/ python3.4 # exec the above code, it will output „None“ exit xsltproc rules/schtron/rule04E.xsl testdata/idp_incomplete.xml # show expected behavior Thanks, your help is appreciated. - Rainer

...

Python 2.7.5 (default, Aug 12 2013, 15:01:02) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
from lxml import etree print etree.__version__ 3.6.0

xsl_file = "rule04E.xsl" xsl_doc = etree.parse(xsl_file) transform = etree.XSLT(xsl_doc) print transform.error_log

...
...
...
print transform.error_log.last_error None

from lxml import isoschematron sch_file = "rule04E.sch" sch_doc = etree.parse(sch_file) schematron = isoschematron.Schematron(sch_doc, store_xslt=True) print type(schematron.validator_xslt) print schematron._validator.error_log

...
...
...
print schematron._validator.error_log.last_error None

Of course, neither xsl transformation nor schematron validation are actually applied to any input document here. So an empty error log is probably expectable.

(note that I needed to add the iso:schema root element to rule04E.sch and fix the patter id attribute to make it a valid stand-alone schematron schema, processable with isoschematron.Schematron, i.e.

http://purl.oclc.org/dsdl/schematron"> http://www.w3.org/2000/09/xmldsig#" prefix="ds"/>
...
Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data)

Error (04): Each SPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) )

Holger

Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml

Holger Joukl

10:40 a.m.

New subject: cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

...

I left out the further code because I think that the error is in the XSLT object, but I may be wrong. Here it is:

df = 'testdata/idp_incomplete.xml‘ md_dom = ET.parse(df) out_dom = transform(md_dom) print(ET.tostring(out_dom, xml_declaration=False, encoding='utf-8'))

No, the actual transform leads to the error, see below.

...

python3.4 # exec the above code, it will output „None“ exit xsltproc rules/schtron/rule04E.xsl testdata/idp_incomplete.xml # show expected behavior

Corporate env and no quick way to fire up a docker install here. Anyway, I can confirm xsltproc & lxml indeed behave differently on your rule04E.xsl: $ xsltproc rule04E.xsl idp_incomplete.xml Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor[@use='signing' or not (@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) $ python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: unknown error <string>:0:0:ERROR:XSLT:ERR_OK: unknown error Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run: python2.7 -c 'from lxml import etree; xsl = etree.XSLT(etree.parse ("rule04E_oxygen.xsl")); output = xsl(etree.parse("idp_incomplete.xml")); print output; print xsl.error_log' <string>:0:0:ERROR:XSLT:ERR_OK: Info: Validating entityID https://idp.example.org/idp.xml XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor[1] validation rule: (@entityID) <string>:0:0:ERROR:XSLT:ERR_OK: Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) XPATH: /md:EntitiesDescriptor[1] /md:EntityDescriptor [1] /md:IDPSSODescriptor[1] validation rule: (md:KeyDescriptor [@use='signing' or not(@use)]/ds:KeyInfo/ds:X509Data/ds:X509Certificate) (oxygen reindent does what it says and basically pretty-printing, so it may well remove whitespace & whatnot) No idea what makes etree.XSLT() choke on the original xsl file, in contrast to xsltproc. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Rainer Hoerbe

4:44 p.m.

...

Am 01.04.2016 um 17:40 schrieb Holger Joukl :

Anyway, I can confirm xsltproc & lxml indeed behave differently on your rule04E.xsl:

I have dozens of these where lxml fails :-(

...

<string>:0:0:ERROR:XSLT:ERR_OK: unknown error

Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run:

Please could you provide me with the edited file (pastebin.com etc.)? Thanks, Rainer

Holger Joukl

4 Apr 4 Apr

1:19 a.m.

New subject: cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

...

...
Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run:

Please could you provide me with the edited file (pastebin.com etc.)?

Sorry, all things looking like "internet storage" are generally blocked here for me. Maybe you could rather download oxygen and try this out yourself, it comes fully functional with an evaluation period. Btw have you tried lxml's isoschematron functionality directly with the original stand-alone schematron schema or an xsd with included schematron rules as input? Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Rainer Hoerbe

5 Apr 5 Apr

11:54 a.m.

Resolved. I failed to use the stylesheet’s error_log attribute. I should have notes, as xsltproc is writing to stderr in a similar manner. Thank you for providing me with the minimized example, this did help me to find this out. The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath. - Rainer

...

Am 04.04.2016 um 08:19 schrieb Holger Joukl :

...
...
Strangely, if I just load rule04E.xsl in oXygen XML editor, reindent & save I can now successfully run:

Please could you provide me with the edited file (pastebin.com etc.)?

Sorry, all things looking like "internet storage" are generally blocked here for me. Maybe you could rather download oxygen and try this out yourself, it comes fully functional with an evaluation period.

Btw have you tried lxml's isoschematron functionality directly with the original stand-alone schematron schema or an xsd with included schematron rules as input?

Holger

Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml

Stefan Behnel

3:02 p.m.

New subject: cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

Rainer Hoerbe schrieb am 05.04.2016 um 18:54:

...

The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath.

Note that the result is not just a string but a log entry object. You can ask it for its details programmatically. The (admittedly ugly) fact that it says both "ERR_OK" and "unknown error" is because the error reporting interface in libxslt is, well, somewhat suboptimal. Stefan

Rainer Hoerbe

7 Apr 7 Apr

2:49 p.m.

I was now able to reproduce with several unit tests that reformatting the style sheet will get rid of the bug where lxml is reporting "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ when it should not. Creating style sheets with Xerces/J instead of xsltproc did not help either. I reported this as bug #1567633. - Rainer

...

Am 05.04.2016 um 22:02 schrieb Stefan Behnel :

Rainer Hoerbe schrieb am 05.04.2016 um 18:54:

...
The lxml output is a bit ugly, like "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ instead of an empty string, but I can live with that an it saves me from using Xerces or rewriting the schematron rules as xslt or xpath.

Note that the result is not just a string but a log entry object. You can ask it for its details programmatically.

The (admittedly ugly) fact that it says both "ERR_OK" and "unknown error" is because the error reporting interface in libxslt is, well, somewhat suboptimal.

Stefan

_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml

Holger Joukl

8 Apr 8 Apr

1:54 a.m.

New subject: cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

Hi,

...

I was now able to reproduce with several unit tests that reformatting the style sheet will get rid of the bug where lxml is reporting "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error“ when it should not. Creating style sheets with Xerces/J instead of xsltproc did not help either.

I reported this as bug #1567633.

- Rainer

I still don't quite follow. While there seems to be different behaviour between xsltproc and etree.XSLT: How do you create the validating xsls from the schematron schema? Looks like this works just fine if I use lxml.isoschematron directly on the schematron schema:

...

...
...
from lxml import etree print etree.__version__3.6.0 print etree.LIBXML_VERSION, etree.LIBXML_COMPILED_VERSION (2, 9, 1) (2, 9, 1) print etree.LIBXSLT_VERSION, etree.LIBXSLT_COMPILED_VERSION (1, 1, 28) (1, 1, 28) schematron = isoschematron.Schematron(file="/tmp/rule04E_complete.sch", store_report=True) doc = etree.parse("/tmp/rule4_test1_idp_missing_key.xml") schematron.validate(doc) False print schematron.validation_report

<?xml version="1.0" standalone="yes"?>

http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:rpi="urn:oasis:names:tc:SAML:metadata:rpi" xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui" xmlns:alg="urn:oasis:names:tc:SAML:metadata:algsupport" xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:idpdisc="urn:oasis:names:tc:SAML:profiles:SSO:idp-discovery-protocol" xmlns:mdattr="urn:oasis:names:tc:SAML:metadata:attribute" xmlns:init="urn:oasis:names:tc:SAML:profiles:SSO:request-init" title="" schemaVersion="ISO19757-3">  http://www.w3.org/2000/09/xmldsig#" prefix="ds"/> svrl:text Error (04): Each IDPSSODescriptor must contain a signing key as X509Certificate (child element of X509Data) Internally, lxml.isoschematron generates the validation stylesheets base on the isoschematron skeleton reference implementation (the XSLs are included in the lxml.isoschematron package). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Rainer Hoerbe

5 a.m.

Hi Holger, Thank for the guidance towards using lxml.isoschematron - I was missing this. However, one new problem appeared: "ValueError: Unicode strings with encoding declaration are not supported“. While I can easily work around that with the schematron files which are part of the project, the files to be validated are both UTF8 and various 8-bit flavors (win1252, ..). How to deal with this in python 3? Do I have to process the files upfront (extract encoding, convert to to UTF-8 and remove encoding? the doc on http://lxml.de/parsing.html#python-unicode-strings reads like a python 2 hack. - Rainer

Holger Joukl

5:31 a.m.

New subject: cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

Hi,

...

However, one new problem appeared: "ValueError: Unicode strings with encoding declaration are not supported“. While I can easily work around that with the schematron files which are part of the project, the files to be validated are both UTF8 and various 8-bit flavors (win1252, ..). How to deal with this in python 3? Do I have to process the files upfront (extract encoding, convert to to UTF-8 and remove encoding? the doc on http://lxml.de/parsing.html#python-unicode-strings reads like a python 2 hack.

Why would you need to read from unicode strings? Let lxml parse the files: The XML parser is responsible for doing the decoding and will do this just fine unless the input files are severely broken. Or read from byte strings, if for some reason you really need to read the file contents yourself. lxml should happily accept byte strings with encoding declarations. In other words, don't decode the read binary data to unicode before passing to lxml. If you absolutely cannot do this - but I'd bet you can - you could still manually remove the encoding declaration from the start of the unicode string and then feed into fromstring(). Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Rainer Hoerbe

6:23 a.m.

...

Why would you need to read from unicode strings?

Let lxml parse the files: The XML parser is responsible for doing the decoding and will do this just fine unless the input files are severely broken.

Or read from byte strings, if for some reason you really need to read the file contents yourself. lxml should happily accept byte strings with encoding declarations. In other words, don't decode the read binary data to unicode before passing to lxml.

If you absolutely cannot do this - but I'd bet you can - you could still manually remove the encoding declaration from the start of the unicode string and then feed into fromstring().

Holger

Got it - I need to parse the file directly. Thanks, Rainer

2940

Age (days ago)

2948

Last active (days ago)

List overview

Download

12 comments

3 participants

participants (3)

Holger Joukl
Rainer Hoerbe
Stefan Behnel

cannot load Schematron stylesheet: "<string>:0:0:ERROR:XSLT:ERR_OK: unknown error"

Rainer Hoerbe

Holger Joukl

Rainer Hoerbe

Holger Joukl

Rainer Hoerbe

Holger Joukl

Rainer Hoerbe

Stefan Behnel

Rainer Hoerbe

Holger Joukl

Rainer Hoerbe

Holger Joukl

Rainer Hoerbe

tags

participants (3)