[lxml-dev] cElementTree compatibility issue
Hello, I wanted to try to replace cElementTree by lxml.etree in the promising bazaar-ng SCM tool [1], but it seems to me there is a API compatibility issue. I read the compatbility doc [2] to as to replace import lines in the BZR source code, but then whene I run the tests, I get:: AttributeError: 'etree._ElementTree' object has no attribute 'parse' Here are the steps to reproduce the problem:: # download BZR: % rsync -av --delete bazaar-ng.org::bazaar-ng/bzr/bzr.dev . % cd bzr.dev/ # replace the import lines: % perl -p -i -e 's/from cElementTree/from lxml.etree/g' bzrlib/*.py # run the tests: % python bzr selftest The orignal import lines all have the form::
from cElementTree import Element, ElementTree, SubElement
What am I doing wrong? [1] http://bazaar-ng.org [2] http://codespeak.net/lxml/compatibility.html Best, -- Olivier
Olivier Grisel wrote:
I wanted to try to replace cElementTree by lxml.etree in the promising bazaar-ng SCM tool [1],
Cool! I didn't know it was using ElementTree. Thanks for trying this! What motivated you to try this?
but it seems to me there is a API compatibility issue.
This is quite possible.
I read the compatbility doc [2] to as to replace import lines in the BZR source code, but then whene I run the tests, I get::
AttributeError: 'etree._ElementTree' object has no attribute 'parse'
Yup, this is a known (by me :) weakness in the lxml implementation -- not all of the API of ElementTree is supported (yet). What 'parse' on the ElementTree class is *replace* the contents of an XML document with a newly parsed tree. There's in fact a commented out completely bogus code for it in etree.pyx: ## def parse(self, source, parser=None): ## # XXX ignore parser for now ## cdef xmlDoc* c_doc ## c_doc = theParser.parseDoc(source) ## result._c_doc = c_doc ## return self.getroot() I recall I didn't finish this implementation as I thought about the scary consequences of doing this. Perhaps I will give it another stab.. [snip]
What am I doing wrong?
Nothing, I'm afraid. I should extend the compatibility text with this information. It's unfortunately very hard to support from ElementTree features on top of libxml2, so switching over the imports will not work for all applications and will likely require some understanding of the code... I hope I can resolve some of these issues with Fredrik Lundh eventually so we can nail down more firmly what the ElementTree API is. Regards, Martijn
Martijn Faassen wrote:
I hope I can resolve some of these issues with Fredrik Lundh eventually so we can nail down more firmly what the ElementTree API is.
A PEP defining that API (similar to what PEP333 is to WSGI) would be awesome. Just something to think about... Philipp
Martijn Faassen a écrit :
Cool! I didn't know it was using ElementTree. Thanks for trying this! What motivated you to try this?
I found both bzr and lxml great technologies and I just wanted to play with them together. I really like the pythonish API of elementtree and I wanted to benchmark lxml.etree by running the test suite of bzr and comparing the results with those of cElementTree on some 'real world' use cases.
not all of the API of ElementTree is supported (yet). What 'parse' on the ElementTree class is *replace* the contents of an XML document with a newly parsed tree. [snip] I recall I didn't finish this implementation as I thought about the scary consequences of doing this. Perhaps I will give it another stab..
Thanks. What are the 'scary consequences' of replacing the document with a newly parsed tree?
Nothing, I'm afraid. I should extend the compatibility text with this information. It's unfortunately very hard to support from ElementTree features on top of libxml2, so switching over the imports will not work for all applications and will likely require some understanding of the code... I hope I can resolve some of these issues with Fredrik Lundh eventually so we can nail down more firmly what the ElementTree API is.
Great. Fredrik Lundh regurlaly posts on the bzr ML so he might be aware if the XML needs and architecture of BZR. Best, -- Olivier
Olivier Grisel wrote:
Martijn Faassen a écrit :
Cool! I didn't know it was using ElementTree. Thanks for trying this! What motivated you to try this?
I found both bzr and lxml great technologies and I just wanted to play with them together. I really like the pythonish API of elementtree and I wanted to benchmark lxml.etree by running the test suite of bzr and comparing the results with those of cElementTree on some 'real world' use cases.
Yes, I'm curious to see what the results would be like. I wouldn't be surprised if cElementTree was faster -- lxml code *can* be faster but typically only when special features of lxml are used, such as XPath. Still, I wouldn't expect it to be that much slower either.
not all of the API of ElementTree is supported (yet). What 'parse' on the ElementTree class is *replace* the contents of an XML document with a newly parsed tree. [snip] I recall I didn't finish this implementation as I thought about the scary consequences of doing this. Perhaps I will give it another stab..
Thanks. What are the 'scary consequences' of replacing the document with a newly parsed tree?
I think what I was worried about was stray nodes floating about still connected to a tree (which is also a stray). There's probably a way around this; we could replace the root element with a new one, disconnecting the previous root, and it may just all work (including garbage collection). I just remember getting scared when I was thinking about how to implement it. :) I need to think about it some more...
Nothing, I'm afraid. I should extend the compatibility text with this information. It's unfortunately very hard to support from ElementTree features on top of libxml2, so switching over the imports will not work for all applications and will likely require some understanding of the code... I hope I can resolve some of these issues with Fredrik Lundh eventually so we can nail down more firmly what the ElementTree API is.
Great. Fredrik Lundh regurlaly posts on the bzr ML so he might be aware if the XML needs and architecture of BZR.
Yeah, Fredrik and I chat once every while, but I still need him to answer this issue. :) Regards, Martijn
Martijn Faassen wrote:
Thanks. What are the 'scary consequences' of replacing the document with a newly parsed tree?
I think what I was worried about was stray nodes floating about still connected to a tree (which is also a stray). There's probably a way around this; we could replace the root element with a new one, disconnecting the previous root
that's a perfectly valid approach.
and it may just all work (including garbage collection). I just remember getting scared when I was thinking about how to implement it. :) I need to think about it some more...
without looking at the code, I assume your ElementTree type holds a reference to some libxml-level document object. if so, parsing into an ElementTree shouldn't be much different *at the libxml level* than creating an empty tree, parsing into another tree, and deleting the first. </F>
participants (4)
-
Fredrik Lundh
-
Martijn Faassen
-
Olivier Grisel
-
Philipp von Weitershausen