Generating code from schema
Hi, just a quick question about what you can and cannot do with lxml's schema support. In openpyxl we're moving towards a 1:1 implementation of the underlying schema. lxml.objectify isn't directly an option for two reasons: lxml is an optional dependency and there are cases where we'd definitely run out of memory. Instead we're using descriptors to enforce type definitions. This means a little more code but now that it seems to be working well I was thinking whether we could automate some of the process. I've looked at some of the existing XSD to Python generators but the generated code is far from what I'd like to have. Can we use the lxml schema support for anything other than validation? ie. can I query a schema object for a particular definition? Or is the best approach to parse the XSD files directly and work through the definitions with a mapping? Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
Hi, "lxml" <lxml-bounces@lxml.de> schrieb am 23.10.2014 11:48:12:
Von: "Charlie Clark" Betreff: [lxml] Generating code from schema
just a quick question about what you can and cannot do with lxml's schema
support. In openpyxl we're moving towards a 1:1 implementation of the underlying schema. lxml.objectify isn't directly an option for two reasons: lxml is an optional dependency and there are cases where we'd definitely run out of memory.
Out of curiosity: Run out of memory when doing what exactly? Can't really be for representing the schema, can it?
Instead we're using descriptors to enforce type definitions. This means a little more code but now that it seems to
be working well I was thinking whether we could automate some of the process. I've looked at some of the existing XSD to Python generators but
the generated code is far from what I'd like to have.
Can we use the lxml schema support for anything other than validation? ie. can I query a schema object for a particular definition? Or is the best approach to parse the XSD files directly and work through the definitions
with a mapping?
In this old lxml blueprint Stefan mentions that you cannot easily hook into the libxml2 schema support: https://bugs.launchpad.net/lxml/+bug/186600 Similar questions wrt to "typification" or instance generation have been asked before, but unfortunately it looks like lxml can't use libxml2 for this. So you'd need to (re-) implement XML Schema capabilities to do anything with the Schema information, i.e. adding/enforcing type information, generating structure or whatnot. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
On 10/23/14 13:43, Holger Joukl wrote:
So you'd need to (re-) implement XML Schema capabilities to do anything with the Schema information, i.e. adding/enforcing type information, generating structure or whatnot.
FYI, most of such stuff related is implemented here and there inside Spyne. Here's the xml schema parser: https://github.com/arskom/spyne/blob/master/spyne/interface/xml_schema/parse... It deserializes the schema according to the schema definition in https://github.com/arskom/spyne/blob/master/spyne/interface/xml_schema/defn.... and creates Python classes from object definitions found in the schema. The code imitates libxml's own schema parser up to a point -- I wrote the python code with the libxml's C code open side-by-side. Enforcing schema rules on incoming Xml documents are supported maybe since 2009 or something. Here's a stackoverflow answer that solves a given problem using both spyne and lxml.objectify. http://stackoverflow.com/questions/19545067/python-joining-and-writing-xml-e... And here's how all this is supposed to be used together: https://github.com/arskom/spyne/blob/master/examples/xml/schema.py The missing bit is xsd => python compiler. I started implementing it but figured it was a bit pointless and stopped working on it. Here's what's done so far: https://github.com/arskom/spyne/blob/master/spyne/interface/xml_schema/genpy... Best, Burak
Hiya Burak, Am .10.2014, 14:18 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>: all looks very good, thanks.
The missing bit is xsd => python compiler. I started implementing it but figured it was a bit pointless and stopped working on it.
That's when it gets a bit application / domain-specific but from the bit of code I've looked at, it looks like you've adopted a very similar approach by using descriptors. Does your meta-class handle the conflict if __slots__ are used? Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
On 10/26/14 15:32, Charlie Clark wrote:
Hiya Burak,
Am .10.2014, 14:18 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
all looks very good, thanks.
The missing bit is xsd => python compiler. I started implementing it but figured it was a bit pointless and stopped working on it.
That's when it gets a bit application / domain-specific
Unless I'm missing something, it only gets spyne-specific and that if that's acceptable there isn't much flexibility left for converting an object definition in xml schema format to spyne format.
but from the bit of code I've looked at, it looks like you've adopted a very similar approach by using descriptors. Does your meta-class handle the conflict if __slots__ are used?
No, I don't treat __slots__ specially. Why do you need it? burak
Am .10.2014, 14:18 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
The missing bit is xsd => python compiler. I started implementing it but figured it was a bit pointless and stopped working on it. Here's what's done so far: https://github.com/arskom/spyne/blob/master/spyne/interface/xml_schema/genpy...
To return briefly to this after some more tinkering. I can only agree with you on anyone wanting use this is in anything serious has to be insane. I'm working with OOXML which counts as insane, I think! ;-) That said, the generated code can be a start for a Pythonic interface to an XML schema. Unfortunately, I wasn't able to work out from the example how to use it to generate some code for a particular element of a schema. My own horrible spaghetti code is now generating nearly usable Python classes: https://bitbucket.org/openpyxl/openpyxl/src/4398c749e7227834e25bb9a345de2847... Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
On 01/26/15 17:35, Charlie Clark wrote:
Am .10.2014, 14:18 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
The missing bit is xsd => python compiler. I started implementing it but figured it was a bit pointless and stopped working on it. Here's what's done so far: https://github.com/arskom/spyne/blob/master/spyne/interface/xml_schema/genpy...
To return briefly to this after some more tinkering. I can only agree with you on anyone wanting use this is in anything serious has to be insane. I'm working with OOXML which counts as insane, I think! ;-)
That said, the generated code can be a start for a Pythonic interface to an XML schema. Unfortunately, I wasn't able to work out from the example how to use it to generate some code for a particular element of a schema.
Perhaps I was a little unclear as to why regenerating Python classes from xml schema definitions was pointless. Obviously in Python, it's possible to define classes at runtime. Actually, that's what Spyne's XML Schema parser does. It defines classes exactly the same way a normal Python class gets defined -- you know, by instantiating a `type` subclass, which is the most popular metaclass base out there. So you don't need to generate code. You can parse an xsd document and directly use classes from that document. In other words, thanks to Spyne, you can directly import xsd documents. When you look at this line: https://github.com/arskom/spyne/blob/05af7fe2c99f75b893276169c4ec7949a3e9da0... The example there serializes the SomeObject definition to an xml schema document, and then deserializes it and assigns it to the NewObject variable.
My own horrible spaghetti code is now generating nearly usable Python classes: https://bitbucket.org/openpyxl/openpyxl/src/4398c749e7227834e25bb9a345de2847...
Now if with all the above said, you're still hell bent on generating python code from an xml schema document, you can do it like this: https://github.com/plq/spyne/commit/78c3a86b26638fa6b82578f4cdc3550ca54aa11e best of luck :) cheers, burak
Am .01.2015, 00:51 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
Perhaps I was a little unclear as to why regenerating Python classes from xml schema definitions was pointless.
Obviously in Python, it's possible to define classes at runtime. Actually, that's what Spyne's XML Schema parser does. It defines classes exactly the same way a normal Python class gets defined -- you know, by instantiating a `type` subclass, which is the most popular metaclass base out there.
Depends on what you want to do with them. Debugging and testing any code objects created at runtime can be fun. But I agree it's perfectly fine in Spyne. The reason I want to do it is because I'm not happy with the current XML schema both due to its arbitrary and unnecessary nesting (you often have booleans and similar as child objects rather than attributes, and there other even greater insanities) and often bizarre element names. As this is stuff that is to be exposed to client code via an API it's nice to be able smooth some of the edges so that people can get to work without repeatedly poring over several thousand pages of specification.
So you don't need to generate code. You can parse an xsd document and directly use classes from that document. In other words, thanks to Spyne, you can directly import xsd documents.
When you look at this line: https://github.com/arskom/spyne/blob/05af7fe2c99f75b893276169c4ec7949a3e9da0... The example there serializes the SomeObject definition to an xml schema document, and then deserializes it and assigns it to the NewObject variable.
Thanks very much for the tip but I've fallen at the first hurdle: schema = parse_schema_file("openpyxl/tests/schemas/sml.xsd") Traceback (most recent call last): File "/Applications/WingIDE.app/Contents/Resources/src/debug/tserver/_sandbox.py", line 1, in <module> # Used internally for debug sandbox under external interpreter File "/Users/charlieclark/Projects/openpyxl/lib/python3.4/site-packages/spyne/util/xml.py", line 153, in parse_schema_file .parse_schema(elt) File "/Users/charlieclark/Projects/openpyxl/lib/python3.4/site-packages/spyne/interface/xml_schema/parser.py", line 545, in parse_schema file_name = self.files[imp.namespace] builtins.KeyError: 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' While my own code sort of works, I'm obviously not an expert in this field and am hopeful that Spyne can be useful. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
On 01/27/15 12:47, Charlie Clark wrote:
Thanks very much for the tip but I've fallen at the first hurdle:
schema = parse_schema_file("openpyxl/tests/schemas/sml.xsd") Traceback (most recent call last): File "/Applications/WingIDE.app/Contents/Resources/src/debug/tserver/_sandbox.py", line 1, in <module> # Used internally for debug sandbox under external interpreter File "/Users/charlieclark/Projects/openpyxl/lib/python3.4/site-packages/spyne/util/xml.py", line 153, in parse_schema_file .parse_schema(elt) File "/Users/charlieclark/Projects/openpyxl/lib/python3.4/site-packages/spyne/interface/xml_schema/parser.py", line 545, in parse_schema file_name = self.files[imp.namespace] builtins.KeyError: 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
ah, that looks like a bug. please file an issue at spyne issue tracker with that schema file attached so that I can have a look at it. thanks, burak
Am .01.2015, 14:55 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
ah, that looks like a bug. please file an issue at spyne issue tracker with that schema file attached so that I can have a look at it.
Will do, though I don't know how you add text files to Github issues. The files are all pretty chunky and it might make more sense to get the package directly from ECMA Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
On Tue, Jan 27, 2015 at 03:07:28PM +0100, Charlie Clark wrote:
Am .01.2015, 14:55 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
ah, that looks like a bug. please file an issue at spyne issue tracker with that schema file attached so that I can have a look at it.
Will do, though I don't know how you add text files to Github issues. The files are all pretty chunky and it might make more sense to get the package directly from ECMA
I believe it's customary to add them to a gist (https://gist.github.com/) and then link to that gist from a GitHub issue. Marius Gedminas -- If your company is not involved in something called "ISO 9000" you probably have no idea what it is. If your company _is_ involved in ISO 9000 then you definitely have no idea what it is. (Scott Adams - The Dilbert principle)
Am .01.2015, 21:11 Uhr, schrieb Marius Gedminas <marius@gedmin.as>:
I believe it's customary to add them to a gist (https://gist.github.com/) and then link to that gist from a GitHub issue.
Thanks for the tip. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
On 01/27/15 22:42, Charlie Clark wrote:
Am .01.2015, 21:11 Uhr, schrieb Marius Gedminas <marius@gedmin.as>:
I believe it's customary to add them to a gist (https://gist.github.com/) and then link to that gist from a GitHub issue.
Thanks for the tip.
Charlie
As a follow up, this was due to Spyne not implementing <xs:union> and <xs:list> for <xs:simpleType>. Some workarounds are in place now for that, but patches are welcome. See https://github.com/arskom/spyne/issues/422 and https://github.com/arskom/spyne/issues/423 . I think it's best to move this discussion to the issues or if not to http://lists.spyne.io/listinfo/people as we're barely on topic here now. thanks all. best, burak
Am .01.2015, 11:08 Uhr, schrieb Burak Arslan <burak.arslan@arskom.com.tr>:
As a follow up, this was due to Spyne not implementing <xs:union> and <xs:list> for <xs:simpleType>. Some workarounds are in place now for that, but patches are welcome. See https://github.com/arskom/spyne/issues/422 and https://github.com/arskom/spyne/issues/423 .
I think it's best to move this discussion to the issues or if not to http://lists.spyne.io/listinfo/people as we're barely on topic here now.
Well, we can do that but I do think there may be value in it for a wider public. While we should certainly keep the details off this list, I think the approach that Spyne follows, which I think I need to adapt to my particular problem, is similar to the objectify approach in lxml. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
participants (4)
-
Burak Arslan
-
Charlie Clark
-
Holger Joukl
-
Marius Gedminas