Relax NG Compact syntax support

Hi all, I recently had a situation at work where I wanted to use the RELAX NG Compact syntax to write a schema for some XML configuration files we are using. Having happily worked with lxml before, I googled around to see if I could feed it RNC schemas somehow, and was disappointed to only find some mailing list threads about this not being possible. There was some Python code by David Mertz that implemented a small subset of RNC, but this was deemed to limited to be fit for integration with lxml. I've since spent quite some time looking at rnc2rng and polishing it. My fork is here: https://github.com/djc/rnc2rng It now supports almost all of the syntax surface according to the RNC spec, I've created a fairly comprehensive regression test set (with about 90% coverage right now), and it works successfully on some relatively large real-world schemas that I ran into. Would you be interested to collaborate on somehow providing support for rnc2rng in lxml? Cheers, Dirkjan

Dirkjan Ochtman schrieb am 01.11.2015 um 13:07:
Sounds great!
Would you be interested to collaborate on somehow providing support for rnc2rng in lxml?
I'd be happy to make lxml call rnc2rng at need. The RelaxNG API is implemented in relaxng.pxi. There could be a helper function that takes an RNC file path or file(-like) object, tries importing rnc2rng, and then uses it to parse it into a RelaxNG object. The simplest way to generate an lxml RNG tree seems to be through the parser. The write() calls in the XML serialiser would need to be directed to parser.feed() through a simple wrapper, and the result of the parsing (root = parser.close()) can be passed directly into RelaxNG(). Stefan

On Sun, Nov 1, 2015 at 4:01 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Sounds good!
This sounds like you want the serializer to implement a streaming API. Unfortunately, the serializer output cannot currently be streamed easily, because the opening grammar element declares namespaces that are discovered during serialization of the node tree. Can we start the integration with a simplified API that just keeps the full RNG XML in memory before parsing it, or do you think a streaming API is a requirement? Cheers, Dirkjan

On Sun, Nov 1, 2015 at 4:01 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I've taken a whack at this: diff --git a/src/lxml/relaxng.pxi b/src/lxml/relaxng.pxi index de486e1..4e2dd96 100644 --- a/src/lxml/relaxng.pxi +++ b/src/lxml/relaxng.pxi @@ -1,6 +1,11 @@ # support for RelaxNG validation from lxml.includes cimport relaxng +try: + import rnc2rng +except ImportError: + rnc2rng = None + class RelaxNGError(LxmlError): u"""Base class for RelaxNG errors. """ @@ -45,7 +50,18 @@ cdef class RelaxNG(_Validator): fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc) elif file is not None: - if _isString(file): + if _isString(file) and file.endswith('.rnc'): + if rnc2rng is None: + msg = 'compact syntax not supported (please install rnc2rng)' + raise RelaxNGParseError(msg) + else: + etree = fromstring(rnc2rng.dumps(rnc2rng.load(file))) + doc = _documentOrRaise(etree) + root_node = _rootNodeOrRaise(etree) + c_node = root_node._c_node + fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) + parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc) + elif _isString(file): doc = None filename = _encodeFilename(file) with self._error_log: Does that make sense? I had an earlier version of this patch sort of working, but now it starts a build error that I don't understand: Traceback (most recent call last): File "setup.py", line 233, in <module> **setup_extra_options() File "/usr/lib/python2.7/distutils/core.py", line 111, in setup _setup_distribution = dist = klass(attrs) File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 272, in __init__ _Distribution.__init__(self,attrs) File "/usr/lib/python2.7/distutils/dist.py", line 287, in __init__ self.finalize_options() File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 326, in finalize_options ep.require(installer=self.fetch_build_egg) File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2370, in require reqs = self.dist.requires(self.extras) File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2602, in requires dm = self._dep_map File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2658, in __getattr__ raise AttributeError(attr) AttributeError: _dep_map Makefile:17: recipe for target 'inplace' failed make: *** [inplace] Error 1 Cheers, Dirkjan

Dirkjan Ochtman schrieb am 11.12.2015 um 09:44:
Merged, thanks. I cleaned it up a bit afterwards. Two things you could add: - tests, would go into test_relaxng.py with a conditional import as in test_css.py (remember: code that isn't tested is most likely broken) - string parsing support using a separate helper class method like RelaxNG.from_rnc_data() Stefan

On Fri, Dec 11, 2015 at 2:00 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Merged, thanks. I cleaned it up a bit afterwards.
Thanks!
Ah, I'd wondered when you'd be asking about that! Asking after merging the initial PR actually seems like a nice way of doing it. :)
- string parsing support using a separate helper class method like RelaxNG.from_rnc_data()
I've added tests, the proposed helper method and another tweak here: https://github.com/lxml/lxml/pull/182 However, it seems to make the tests fail (on Travis), despite me trying to follow the test_css skipif pattern. Any hints on how I can fix it up? (On my machine with rnc2rng installed, the tests pass.) Cheers, Dirkjan

Dirkjan Ochtman schrieb am 12.12.2015 um 22:10:
The test suite isn't actually executed with py.test (for legacy and dependency reasons), it just *can* be. To make it work with standard unittest as well, there's a second part to the skipping at the bottom of test_css.py in the test_suite() function which simply excludes it from the test suite it builds. I had to look that up as well because I had completely forgotten about it. :) Stefan

On Sun, Dec 27, 2015 at 4:44 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Heh, nice! I've fixed up the PR to make the tests pass (they now do -- though the PyPy ones are still going) and also addressed your comment on the PR about accessing a file object's name attribute. Can you take another look, please? Thanks, Dirkjan

Dirkjan Ochtman schrieb am 01.11.2015 um 13:07:
Sounds great!
Would you be interested to collaborate on somehow providing support for rnc2rng in lxml?
I'd be happy to make lxml call rnc2rng at need. The RelaxNG API is implemented in relaxng.pxi. There could be a helper function that takes an RNC file path or file(-like) object, tries importing rnc2rng, and then uses it to parse it into a RelaxNG object. The simplest way to generate an lxml RNG tree seems to be through the parser. The write() calls in the XML serialiser would need to be directed to parser.feed() through a simple wrapper, and the result of the parsing (root = parser.close()) can be passed directly into RelaxNG(). Stefan

On Sun, Nov 1, 2015 at 4:01 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Sounds good!
This sounds like you want the serializer to implement a streaming API. Unfortunately, the serializer output cannot currently be streamed easily, because the opening grammar element declares namespaces that are discovered during serialization of the node tree. Can we start the integration with a simplified API that just keeps the full RNG XML in memory before parsing it, or do you think a streaming API is a requirement? Cheers, Dirkjan

On Sun, Nov 1, 2015 at 4:01 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I've taken a whack at this: diff --git a/src/lxml/relaxng.pxi b/src/lxml/relaxng.pxi index de486e1..4e2dd96 100644 --- a/src/lxml/relaxng.pxi +++ b/src/lxml/relaxng.pxi @@ -1,6 +1,11 @@ # support for RelaxNG validation from lxml.includes cimport relaxng +try: + import rnc2rng +except ImportError: + rnc2rng = None + class RelaxNGError(LxmlError): u"""Base class for RelaxNG errors. """ @@ -45,7 +50,18 @@ cdef class RelaxNG(_Validator): fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc) elif file is not None: - if _isString(file): + if _isString(file) and file.endswith('.rnc'): + if rnc2rng is None: + msg = 'compact syntax not supported (please install rnc2rng)' + raise RelaxNGParseError(msg) + else: + etree = fromstring(rnc2rng.dumps(rnc2rng.load(file))) + doc = _documentOrRaise(etree) + root_node = _rootNodeOrRaise(etree) + c_node = root_node._c_node + fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) + parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc) + elif _isString(file): doc = None filename = _encodeFilename(file) with self._error_log: Does that make sense? I had an earlier version of this patch sort of working, but now it starts a build error that I don't understand: Traceback (most recent call last): File "setup.py", line 233, in <module> **setup_extra_options() File "/usr/lib/python2.7/distutils/core.py", line 111, in setup _setup_distribution = dist = klass(attrs) File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 272, in __init__ _Distribution.__init__(self,attrs) File "/usr/lib/python2.7/distutils/dist.py", line 287, in __init__ self.finalize_options() File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 326, in finalize_options ep.require(installer=self.fetch_build_egg) File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2370, in require reqs = self.dist.requires(self.extras) File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2602, in requires dm = self._dep_map File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2658, in __getattr__ raise AttributeError(attr) AttributeError: _dep_map Makefile:17: recipe for target 'inplace' failed make: *** [inplace] Error 1 Cheers, Dirkjan

Dirkjan Ochtman schrieb am 11.12.2015 um 09:44:
Merged, thanks. I cleaned it up a bit afterwards. Two things you could add: - tests, would go into test_relaxng.py with a conditional import as in test_css.py (remember: code that isn't tested is most likely broken) - string parsing support using a separate helper class method like RelaxNG.from_rnc_data() Stefan

On Fri, Dec 11, 2015 at 2:00 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Merged, thanks. I cleaned it up a bit afterwards.
Thanks!
Ah, I'd wondered when you'd be asking about that! Asking after merging the initial PR actually seems like a nice way of doing it. :)
- string parsing support using a separate helper class method like RelaxNG.from_rnc_data()
I've added tests, the proposed helper method and another tweak here: https://github.com/lxml/lxml/pull/182 However, it seems to make the tests fail (on Travis), despite me trying to follow the test_css skipif pattern. Any hints on how I can fix it up? (On my machine with rnc2rng installed, the tests pass.) Cheers, Dirkjan

Dirkjan Ochtman schrieb am 12.12.2015 um 22:10:
The test suite isn't actually executed with py.test (for legacy and dependency reasons), it just *can* be. To make it work with standard unittest as well, there's a second part to the skipping at the bottom of test_css.py in the test_suite() function which simply excludes it from the test suite it builds. I had to look that up as well because I had completely forgotten about it. :) Stefan

On Sun, Dec 27, 2015 at 4:44 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Heh, nice! I've fixed up the PR to make the tests pass (they now do -- though the PyPy ones are still going) and also addressed your comment on the PR about accessing a file object's name attribute. Can you take another look, please? Thanks, Dirkjan
participants (2)
-
Dirkjan Ochtman
-
Stefan Behnel