[lxml-dev] RelocatableRelaxNG
Hi! Anyone annoyed by the fact that RNG grammars need a fixed starting point? That you can't just validate some subtree that you're interested in? Well, here we go. Stefan Index: src/lxml/relaxng.pxi =================================================================== --- src/lxml/relaxng.pxi (Revision 19820) +++ src/lxml/relaxng.pxi (Arbeitskopie) @@ -59,3 +59,53 @@ if ret == -1: raise RelaxNGValidateError, "Internal error in Relax NG validation" return ret == 0 + +cdef object _build_relocation_stylesheet(): + return XSLT(ElementTree(XML('''\ +<xsl:stylesheet version="1.0" + xmlns:xsl="http://www.w3.org/1999/XSL/Transform" + xmlns:rng="http://relaxng.org/ns/structure/1.0"> + <xsl:template match="rng:start/rng:ref"> + <xsl:copy> + <xsl:attribute name="name"> + <xsl:value-of select="$newref"/> + </xsl:attribute> + </xsl:copy> + </xsl:template> + <xsl:template match="text()"> + <xsl:value-of select="normalize-space()"/> + </xsl:template> + <xsl:template match="*"> + <xsl:copy><xsl:copy-of select="@*"/><xsl:apply-templates /></xsl:copy> + </xsl:template> +</xsl:stylesheet> +'''))) + +class RelocatableRelaxNG: + def __init__(self, tree, start=None): + self._tree = tree + if start is not None: + start = "'%s'" % start + self._start = start + try: + self.__class__._RELOCATION_XSLT + except AttributeError: + self.__class__._RELOCATION_XSLT = _build_relocation_stylesheet() + + def validate(self, xml_tree): + if self._start is None: + rng_tree = self._tree + else: + rng_tree = self._RELOCATION_XSLT.apply(self._tree, + newref=self._start) + rng = RelaxNG(rng_tree) + self.validate = rng.validate # replace this method by the real thing + return rng.validate(xml_tree) + + def copy(self, start=None): + return self.__class__(self._tree, start) + + def relocate(self, start): + self._start = start + try: del self.validate + except AttributeError: pass Index: src/lxml/tests/test_relaxng.py =================================================================== --- src/lxml/tests/test_relaxng.py (Revision 19831) +++ src/lxml/tests/test_relaxng.py (Arbeitskopie) @@ -111,7 +111,39 @@ self.assertEqual(self._rootstring(b_tree), '<b>B</b>') self.assert_(schema.validate(b_tree)) + def test_relaxng_relocation(self): + schema = self.parse('''\ +<grammar xmlns="http://relaxng.org/ns/structure/1.0"> + <start><ref name="a"/></start> + <define name="a"> + <element name="a"> + <ref name="b"/> + </element> + </define> + <define name="b"> + <element name="b"> + <ref name="c"/> + </element> + </define> + <define name="c"> + <element name="c"> + <empty/> + </element> + </define> +</grammar> +''') + schema1 = etree.RelocatableRelaxNG(schema) + tree1 = self.parse('<a><b><c/></b></a>') + self.assert_(schema1.validate(tree1)) + + schema2 = schema1.copy(start='b') + tree2 = self.parse('<b><c/></b>') + self.assert_(schema2.validate(tree2)) + self.assert_(schema1.validate(tree1)) + self.assertFalse(schema1.validate(tree2)) + self.assertFalse(schema2.validate(tree1)) + def test_suite(): suite = unittest.TestSuite() suite.addTests([unittest.makeSuite(ETreeRelaxNGTestCase)])
Stefan Behnel wrote:
Hi!
Anyone annoyed by the fact that RNG grammars need a fixed starting point? That you can't just validate some subtree that you're interested in?
Well, here we go.
Cool feature! Relocatable is a bit of a technical sounding word; I wonder whether we can come up with a better name. Is this XSLT stylesheet of your own devising or did this come from somewhere else? How robust is this in the face of namespaces, Relax NG schemas that consist of multiple files, etc? I also wonder whether we could make the API such that you don't have to copy stylesheets yourself, instead having this happen for you on the fly when you choose to validate a part. This though may require a bit of caching that the explicit form doesn't need. Regards, Martijn
Martijn Faassen wrote:
Stefan Behnel wrote:
Anyone annoyed by the fact that RNG grammars need a fixed starting point? That you can't just validate some subtree that you're interested in?
Well, here we go.
Cool feature!
yup! :)
Relocatable is a bit of a technical sounding word; I wonder whether we can come up with a better name.
Would be nice. Sounds too long and clumsy, I know.
Is this XSLT stylesheet of your own devising or did this come from somewhere else? How robust is this in the face of namespaces, Relax NG schemas that consist of multiple files, etc?
I wrote it myself. There are actually loads of ways of writing it that all do more or less the same. Multiple files are obviously a problem if the <start> tag is not in the top level one (the one that is transformed), so it can't be replaced. Namespaces are not a problem for the stylesheet. I have found a problem, though, that may (or may not) be related to namespaces. It's both in the trunk and my branch (I'm always happy when I can say that) and is pretty weird. All test cases that I could come up with so far work perfectly well, but I have one RNG (not even that long, but much too long for a test case) that fails being read by the RelaxNG class after transformation. libxml2 writes out the error message "(none) is empty", which is obviously true :), but "(none)" here refers to the RNG tree, so it may mean that libxml's xmlRelaxNGCleanupTree function (which is called on the RNG tree before transforming it into a validator) throws away the entire tree for some reason. May or may not be a namespace issue. When you serialize the result tree and then parse it again, everything works perfectly well, so maybe running the XSLT somehow corrupts the internal representation of the tree. Anyway, this is in no way related to the implementation of RRNG. Maybe it's even a bug in libxslt or something.
I also wonder whether we could make the API such that you don't have to copy stylesheets yourself, instead having this happen for you on the fly when you choose to validate a part. This though may require a bit of caching that the explicit form doesn't need.
Hmm, well, you could do that using either XPath or XSLT. You extract the name of the element where you are supposed to start and as long as it only appears once in the RNG you can find it and restructure the RNG to start at that point. S(implified)RNG should be able to assure that this is possible, but I don't think libxml2 can handle SRNG, so there are some caveats. Having SRNG available would solve a lot of problems, though (like the RNG module import problem above). BTW, I considered merging RRNG into the RelaxNG but I don't think it makes sense since RRNG has the additional overhead of requiring a (non-modifiable) copy of the normal XML tree in memory, so that's bad for the common case without relocation. Stefan
participants (2)
-
Martijn Faassen
-
Stefan Behnel