[lxml-dev] helper functions for accepting elementtrees/elements/documents as input
Hi! I noticed that many of the API classes accept _ElementTree object, although it would make just as much sense to apply them to elements or (internally) to documents. So here are three helper functions that you can use at the top of API functions to check what actually came in and to determine how to extract the document that is refered to and the context node within the document. This simplifies clumsy expressions like XSLT(ElementTree(XML("..."))) and the like to XSLT(XML("...")) I checked them into my branch, although they may need modification later, depending on Geerts work. @Geert: I think these could be of interest to you if you extend them to check for filename strings and file-like objects. This may encourage writing a new function that returns both document and node at once to facilitate on-the-fly parsing. Stefan Index: src/lxml/etree.pyx =================================================================== --- src/lxml/etree.pyx (Revision 19828) +++ src/lxml/etree.pyx (Arbeitskopie) @@ -1093,6 +1093,36 @@ # Private helper functions +cdef _Document _documentOrRaise(object input): + cdef _Document doc + doc = _documentOf(input) + if doc is None: + raise TypeError, "Invalid input object: %s" % type(input) + else: + return doc + +cdef _Document _documentOf(object input): + # call this to get the document of a + # _Document, _ElementTree or _NodeBase object + if isinstance(input, _ElementTree): + return (<_ElementTree>input)._doc + elif isinstance(input, _NodeBase): + return (<_NodeBase>input)._doc + elif isinstance(input, _Document): + return <_Document>input + else: + return None + +cdef _NodeBase _rootNodeOf(object input): + # call this to get the root node of a + # _Document, _ElementTree or _NodeBase object + if hasattr(input, 'getroot'): # Document/ElementTree + return <_NodeBase>(input.getroot()) + elif isinstance(input, _NodeBase): + return <_NodeBase>input + else: + return None + cdef xmlDoc* _fakeRootDoc(xmlDoc* c_base_doc, xmlNode* c_node): # build a temporary document that has the given node as root node # note that copy and original must not be modified during its lifetime!!
Stefan Behnel wrote:
Hi!
I noticed that many of the API classes accept _ElementTree object, although it would make just as much sense to apply them to elements or (internally) to documents.
While I agree that cleaning up how this works might be a good thing, I wonder what this means for ElementTree compatibility. I.e. it'll become a lot easier to write lxml code that doesn't work in ElementTree. Perhaps we should check with Fredrik Lundh what he thinks first. Regards, Martijn
Stefan Behnel wrote:
I noticed that many of the API classes accept _ElementTree object, although it would make just as much sense to apply them to elements or (internally) to documents. So here are three helper functions that you can use at the top of API functions to check what actually came in and to determine how to extract the document that is refered to and the context node within the document.
This simplifies clumsy expressions like XSLT(ElementTree(XML("..."))) and the like to XSLT(XML("..."))
I checked them into my branch, although they may need modification later, depending on Geerts work.
While I agree that cleaning up how this works might be a good thing, I wonder what this means for ElementTree compatibility. I.e. it'll become a lot easier to write lxml code that doesn't work in ElementTree. Perhaps we should check with Fredrik Lundh what he thinks first. [snip]
Index: src/lxml/etree.pyx =================================================================== --- src/lxml/etree.pyx (Revision 19828) +++ src/lxml/etree.pyx (Arbeitskopie) @@ -1093,6 +1093,36 @@
# Private helper functions +cdef _Document _documentOrRaise(object input): + cdef _Document doc + doc = _documentOf(input) + if doc is None: + raise TypeError, "Invalid input object: %s" % type(input) + else: + return doc + +cdef _Document _documentOf(object input): + # call this to get the document of a + # _Document, _ElementTree or _NodeBase object + if isinstance(input, _ElementTree): + return (<_ElementTree>input)._doc + elif isinstance(input, _NodeBase): + return (<_NodeBase>input)._doc + elif isinstance(input, _Document): + return <_Document>input + else: + return None + +cdef _NodeBase _rootNodeOf(object input): + # call this to get the root node of a + # _Document, _ElementTree or _NodeBase object + if hasattr(input, 'getroot'): # Document/ElementTree + return <_NodeBase>(input.getroot()) + elif isinstance(input, _NodeBase): + return <_NodeBase>input + else: + return None + cdef xmlDoc* _fakeRootDoc(xmlDoc* c_base_doc, xmlNode* c_node): # build a temporary document that has the given node as root node # note that copy and original must not be modified during its lifetime!!
Sorry for the duplicates; I have a problem with my mail client.. Martijn
participants (2)
-
Martijn Faassen
-
Stefan Behnel