[XML-SIG] docbook 5, lxml and rng
Tim.Arnold at sas.com
Mon Jun 1 16:14:53 CEST 2009
> -----Original Message-----
> From: Stefan Behnel [mailto:stefan_ml at behnel.de]
> Sent: Sunday, May 31, 2009 2:05 AM
> To: Tim Arnold
> Cc: xml-sig at python.org
> Subject: Re: [XML-SIG] docbook 5, lxml and rng
> Tim Arnold wrote:
> > Hi, this is a newbie question I'm sure. I'm trying to validate an
> > example straight out of the docbook 5 documentation (example given
> > on the 'inlineequation' page). As it stands, the file doesn't pass
> > as valid.
> > The code:
> > =======================================
> > from lxml import etree
> > import os
> > # RNGDIR = 'path to docbook.rng'
> > # XMLDIR = 'path to the xml file'
> > relaxng_doc = etree.parse(os.path.join(RNGDIR,'docbook.rng'))
> > relaxng = etree.RelaxNG(relaxng_doc)
> > doc = etree.parse(os.path.join(XMLDIR,'myfile.xml'))
> > print relaxng.validate(doc)
> What does the validator tell you why it's not considered valid? Note that
> there's a property "error_log" which returns a sequence of messages that
> were collected during validation.
Thanks, I should have looked at the documentation more before posting. I see what you're talking about now and I think I might have an explanation of what's going on.
The error_log says:
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMWRONG: Did not expect element para there
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element example, got para
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element bridgehead, got para
4:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element para has extra content: text
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element annotation, got para
4:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element article failed to validate content
But my libxml2 version is 5, which I think means that schematron isn't supported. And the docbook.rng contains some embedded schematron. From the DocBook 5 documentation:
If you want to validate against the DocBook 5 RelaxNG schema, then you have to find the right validation tool. The DocBook 5 RelaxNG schema includes embedded Schematron rules to express certain constraints on some content models. For example, a Schematron rule is added to prevent a sidebar element from containing another sidebar. For complete validation, a validator needs to check both the RelaxNG content models and the Schematron rules.
Does that make sense?
More information about the XML-SIG