[lxml-dev] Access to ElementTree for XML schema

I'm looking for a way to get access to an etree._ElementTree that represents an XML schema document in which the xsd:include and xsd:import elements have been recursively expanded. When I create an instance of etree.XMLSchema, libxml2 expands the underlying C tree for the schema. Am I right about that? If so, is there a way for me to get an etree._ElementTree that wraps that underlying C tree? Or, perhaps to have a way to create an etree._ElementTree from the XMLSchema object? If that document is not available, I suppose that I am asking for a new feature that enables us to retrieve the processed and expanded etree Document from an etree.XMLSchema object. Or, is there already some other way to get an XML schema document tree in which the include and import elements have been (recursively) expanded? The reason I'm asking for this -- I process XML schema documents, and I think we should encourage other Python hackers to do so, too. This (new) feature would enable lxml to support that. I'm trying to implement this capability myself in Python using lxml, but my implementation still has bugs and I'm sure that libxml2 does it better than I can. Thanks for help. - Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman

Dave Kuhlman, 01.09.2010 01:08:
The best ways to find out are to a) read the libxml2 source code or b) add a little debug code that dumps the schema document to a file *after* parsing. Just go ahead, the XML Schema code in lxml is pretty short.
*If* the tree is available as a normal XML tree, it is trivial to copy it and wrap it in an ElementTree, sure.
No. However, is it really that hard to implement the algorithm for that in Python space? Admittedly, XML Schema is a severely complex format, but the import rules are definitely not the most complex part of the spec.
I'm pretty sure it handles imports and includes as specified. It does have a few remaining quirks for certain less common XML Schema features, but all in all, it works pretty well and spec compliant. Stefan

On Fri, Sep 03, 2010 at 08:25:46PM +0200, Stefan Behnel wrote:
I've looked. I'll look at it a bit more. Seems like I'll need to learn more about Cython.
Well, I *believe* that I've done implemented it, now. (That work was part of the delay in this response.) But, I worry that there is some detail that I've gotten wrong. Anyway, the new implementation of this is in the file process_includes.py which is part of the generateDS.py distribution. If anyone needs it, you can find it here: http://www.rexx.com/~dkuhlman/generateDS.html Thanks for the help with this. - Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman

Dave Kuhlman, 01.09.2010 01:08:
The best ways to find out are to a) read the libxml2 source code or b) add a little debug code that dumps the schema document to a file *after* parsing. Just go ahead, the XML Schema code in lxml is pretty short.
*If* the tree is available as a normal XML tree, it is trivial to copy it and wrap it in an ElementTree, sure.
No. However, is it really that hard to implement the algorithm for that in Python space? Admittedly, XML Schema is a severely complex format, but the import rules are definitely not the most complex part of the spec.
I'm pretty sure it handles imports and includes as specified. It does have a few remaining quirks for certain less common XML Schema features, but all in all, it works pretty well and spec compliant. Stefan

On Fri, Sep 03, 2010 at 08:25:46PM +0200, Stefan Behnel wrote:
I've looked. I'll look at it a bit more. Seems like I'll need to learn more about Cython.
Well, I *believe* that I've done implemented it, now. (That work was part of the delay in this response.) But, I worry that there is some detail that I've gotten wrong. Anyway, the new implementation of this is in the file process_includes.py which is part of the generateDS.py distribution. If anyone needs it, you can find it here: http://www.rexx.com/~dkuhlman/generateDS.html Thanks for the help with this. - Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman
participants (2)
-
Dave Kuhlman
-
Stefan Behnel