[lxml-dev] Bug in memory management (I guess ;)
Hi! The past couple of months I kept running into some bug. For some reason the namespace information of certain nodes got lost after moving the nodes from one document to another. I kept trying to isolate the problem, but without success, until this week I found that the reason the information gets lost seems to have to do with the document getting garbage collected even though some nodes are still referred to. With this in mind I managed to write the following demonstration snippet: #-------------------------------------------------------------------------- from lxml import etree s1 = '<a xmlns:a="A:"><b a:b="b" /></a>' tree1 = etree.fromstring(s1) btag = tree1.xpath('//b')[0] del tree1 s2 = '<x />' tree2 = etree.fromstring(s2) tree2.append(btag) # this produces crap... print etree.tostring(tree2) #-------------------------------------------------------------------------- The problem seems to be the 'del' statement: this seems to free the document instance or something, even though there's still a reference to one of its nodes. Serializing the node later on will result in garbage. When the 'del' line is commented out, the XML is produced properly. I hope this is enough information for you guys to track the bug: I'm no C programmer myself, so I don't dare diving into the Pyrex, and also I may have made some errors in my explanation... If you need more information, let me know. Cheers, Guido
Hi, Johnny deBris wrote:
The past couple of months I kept running into some bug. For some reason the namespace information of certain nodes got lost after moving the nodes from one document to another. I kept trying to isolate the problem, but without success, until this week I found that the reason the information gets lost seems to have to do with the document getting garbage collected even though some nodes are still referred to.
With this in mind I managed to write the following demonstration snippet:
#--------------------------------------------------------------------------
from lxml import etree
s1 = '<a xmlns:a="A:"><b a:b="b" /></a>' tree1 = etree.fromstring(s1) btag = tree1.xpath('//b')[0] del tree1
s2 = '<x />' tree2 = etree.fromstring(s2) tree2.append(btag)
# this produces crap... print etree.tostring(tree2)
#--------------------------------------------------------------------------
The problem seems to be the 'del' statement: this seems to free the document instance or something, even though there's still a reference to one of its nodes. Serializing the node later on will result in garbage. When the 'del' line is commented out, the XML is produced properly.
I hope this is enough information for you guys to track the bug: I'm no C programmer myself, so I don't dare diving into the Pyrex, and also I may have made some errors in my explanation... If you need more information, let me know.
Thanks for reporting this and thanks for taking the time to provide a test. This seems to be related to a bug that was recently discovered, but this test case tells me that it's actually different then I thought. I'll have another look at it. Thanks, Stefan
Hi, Johnny deBris wrote:
The past couple of months I kept running into some bug. For some reason the namespace information of certain nodes got lost after moving the nodes from one document to another. I kept trying to isolate the problem, but without success, until this week I found that the reason the information gets lost seems to have to do with the document getting garbage collected even though some nodes are still referred to. [...] I hope this is enough information for you guys to track the bug: I'm no C programmer myself, so I don't dare diving into the Pyrex, and also I may have made some errors in my explanation... If you need more information, let me know.
Thanks a lot. Bug is fixed in the trunk. Stefan
participants (2)
-
Johnny deBris
-
Stefan Behnel