[lxml-dev] Repeating RelaxNG validation
Hi there, I'm in the process of testing hooks for reporting libxml2's so-called "structured errors" in lxml. Among other things, I want to get a more diagnostic message when a document fails to validate against a RelaxNG schema. I ran across a problem where an invalid document will fail the validation the first time but succeed when the validation is repeated. I haven't been able to isolate a small test case, unfortunately; my attempts at simplification give the expected results. I am on OS X 10.4 using lxml pristine from SVN, libxml2 2.6.20 and libxslt 1.1.14 (note: these are *not* the ones that come with the OS but the problem exists there, too). Attached are the RelaxNG schema, the invalid document, and a failing unit test. I hope they're not too large for the list. I apologize in advance if they are. FYI: the invalid element is /notebook/sheet/ipython-block/para on line 18. Thank you, in advance! -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter <notebook> <head> <meta content="/usr/local/bin/ipython" name="cmdline"> </meta> </head> <sheet> <title>Scipy Tutorial</title> <para>There are two (interchangeable) ways to deal with 1-d polynomials in Scipy. The first is to use the <emphasis role="bold">poly1d</emphasis> class in <emphasis role="bold">scipy_base</emphasis>. This class accepts coefficients or polynomial roots to initialize a polynomial. The polynomial object can then be manipulated in algebraic expressions, integrated, differentiated, and evaluated. It even prints like a polynomial: </para> <ipython-block logid="default-log"> <para>This shouldn't be here.</para> <ipython-cell type="input" number="3"> </ipython-cell> <ipython-cell type="input" number="4"> </ipython-cell> <ipython-cell type="input" number="5"> </ipython-cell> <ipython-cell type="stdout" number="5"> </ipython-cell> <ipython-cell type="input" number="6"> </ipython-cell> <ipython-cell type="stdout" number="6"> </ipython-cell> <ipython-cell type="input" number="7"> </ipython-cell> <ipython-cell type="stdout" number="7"> </ipython-cell> <ipython-cell type="input" number="8"> </ipython-cell> <ipython-cell type="output" number="8"> </ipython-cell> <ipython-cell type="input" number="9"> </ipython-cell> <ipython-cell type="output" number="9"> </ipython-cell> </ipython-block> <para>The other way to handle polynomials is an array of coefficients with the first element of the array giving the coefficient of the highest power. There are explicit functions to add, subtract, multiply, divide, integrate, differentiate, and evalute polynomials represented as sequences of coefficients. </para> </sheet> <ipython-log id="default-log"><cell number="3"><input> from scipy import * </input></cell> <cell number="4"><input> p = poly1d([3,4,5]) </input></cell> <cell number="5"><input> print p </input><stdout> 2 3 x + 4 x + 5 </stdout></cell> <cell number="6"><input> print p*p </input><stdout> 4 3 2 9 x + 24 x + 46 x + 40 x + 25 </stdout></cell> <cell number="7"><input> print p.integ(k=6) </input><stdout> 3 2 x + 2 x + 5 x + 6 </stdout></cell> <cell number="8"><input> p.deriv() </input><output>poly1d([6, 4]) </output></cell> <cell number="9"><input> p([4,5]) </input><output>array([ 69, 100]) </output></cell> </ipython-log> </notebook> import unittest from lxml import etree class DuplicateRelaxNGTestCase(unittest.TestCase): def test_once(self): rng = etree.RelaxNG(file='nbk.rng') baddoc = etree.parse('tut-2.3.5-bad.nbk') self.assert_(not rng.validate(baddoc)) # try again self.assert_(not rng.validate(baddoc)) def test_suite(): suite = unittest.TestSuite() suite.addTests([unittest.makeSuite(DuplicateRelaxNGTestCase)]) return suite if __name__ == '__main__': unittest.main()
Hi there, Robert Kern wrote:
I'm in the process of testing hooks for reporting libxml2's so-called "structured errors" in lxml. Among other things, I want to get a more diagnostic message when a document fails to validate against a RelaxNG schema.
This sounds very cool! Looking forward to seeing your work on this..
I ran across a problem where an invalid document will fail the validation the first time but succeed when the validation is repeated. I haven't been able to isolate a small test case, unfortunately; my attempts at simplification give the expected results.
Okay, thanks for reporting it, and for trying to wittle this one down! If nobody steps up before I do, I will be testing this one when I get some free time; I'm still swamped with work so please be patient. Others are of course welcome to look into this one (looking hopefully at Kasimier :). Regards, Martijn
Robert Kern wrote:
I'm in the process of testing hooks for reporting libxml2's so-called "structured errors" in lxml. Among other things, I want to get a more diagnostic message when a document fails to validate against a RelaxNG schema.
I ran across a problem where an invalid document will fail the validation the first time but succeed when the validation is repeated. I haven't been able to isolate a small test case, unfortunately; my attempts at simplification give the expected results.
I am on OS X 10.4 using lxml pristine from SVN, libxml2 2.6.20 and libxslt 1.1.14 (note: these are *not* the ones that come with the OS but the problem exists there, too).
Attached are the RelaxNG schema, the invalid document, and a failing unit test. I hope they're not too large for the list. I apologize in advance if they are. FYI: the invalid element is /notebook/sheet/ipython-block/para on line 18.
Hey, I can reproduce this, but am at a loss. It may point to a problem with libxml2 itself. It is of course possible this bug got fixed in 2.6.21 that was released recently, though on the other hand I don't see any mention of Relax NG fixes in the release notes. Would you be willing to write a C version of this and post a message to the libxml2 mailing list? If not, let me know, and I'll try to write a C version and report it myself. Regards, Martijn
Martijn Faassen wrote:
Would you be willing to write a C version of this and post a message to the libxml2 mailing list? If not, let me know, and I'll try to write a C version and report it myself.
Yup, it's a libxml2 bug. Apply the attached patch to testRelax.c in the libxml2-2.6.21 distribution (possibly earlier ones, too). [test]$ ~/src/libxml2-2.6.21/testRelax nbk.rng tut-2.3.5-bad.nbk tut-2.3.5-bad.nbk:17: element ipython-block: Relax-NG validity error : Expecting element ipython-cell, got para tut-2.3.5-bad.nbk:17: element ipython-block: Relax-NG validity error : Element ipython-block failed to validate content Relax-NG validity error : Extra element ipython-block in interleave tut-2.3.5-bad.nbk:17: element ipython-block: Relax-NG validity error : Element sheet failed to validate content tut-2.3.5-bad.nbk fails to validate tut-2.3.5-bad.nbk validates The same problem occurs with the pair of files test/relaxng/docbook{.rng,_0.xml} from the libxml2 source distribution. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter --- libxml2-2.6.21/testRelax.c 2005-01-04 06:49:47.000000000 -0800 +++ testRelax.c 2005-09-09 19:34:34.000000000 -0700 @@ -143,6 +143,23 @@ int ret; ctxt = xmlRelaxNGNewValidCtxt(schema); + xmlRelaxNGSetValidErrors(ctxt, + (xmlRelaxNGValidityErrorFunc) fprintf, + (xmlRelaxNGValidityWarningFunc) fprintf, + stderr); + ret = xmlRelaxNGValidateDoc(ctxt, doc); + if (ret == 0) { + printf("%s validates\n", argv[i]); + } else if (ret > 0) { + printf("%s fails to validate\n", argv[i]); + } else { + printf("%s validation generated an internal error\n", + argv[i]); + } + xmlRelaxNGFreeValidCtxt(ctxt); + + /* Try again */ + ctxt = xmlRelaxNGNewValidCtxt(schema); xmlRelaxNGSetValidErrors(ctxt, (xmlRelaxNGValidityErrorFunc) fprintf, (xmlRelaxNGValidityWarningFunc) fprintf,
Hi, On Fri, 2005-09-09 at 19:58 -0700, Robert Kern wrote:
Martijn Faassen wrote:
Would you be willing to write a C version of this and post a message to the libxml2 mailing list? If not, let me know, and I'll try to write a C version and report it myself.
Yup, it's a libxml2 bug. Apply the attached patch to testRelax.c in the libxml2-2.6.21 distribution (possibly earlier ones, too).
[...] Daniel Veillard wants to ship a new release today, since there was as API breaker somewhere in 2.6.21... so if you send the patch to libxml2, it might get into the this release. Regards, Kasimier
Kasimier Buchcik wrote:
On Fri, 2005-09-09 at 19:58 -0700, Robert Kern wrote:
Martijn Faassen wrote:
Would you be willing to write a C version of this and post a message to the libxml2 mailing list? If not, let me know, and I'll try to write a C version and report it myself.
Yup, it's a libxml2 bug. Apply the attached patch to testRelax.c in the libxml2-2.6.21 distribution (possibly earlier ones, too).
Thanks for doing this work, Robert!
[...]
Daniel Veillard wants to ship a new release today, since there was as API breaker somewhere in 2.6.21... so if you send the patch to libxml2, it might get into the this release.
I think the patch only demonstrates the problem and does not fix it. That said, Kasimier's suggestion to Robert to report this to the libxml2 developers still stands. (I haven't checked bugzilla but can't find anything on the libxml2 mailing list on this right now) Regards, Martijn
Martijn Faassen wrote:
I think the patch only demonstrates the problem and does not fix it. That said, Kasimier's suggestion to Robert to report this to the libxml2 developers still stands. (I haven't checked bugzilla but can't find anything on the libxml2 mailing list on this right now)
I submitted it it bugzilla. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
participants (3)
-
Kasimier Buchcik
-
Martijn Faassen
-
Robert Kern