When writing to file, all ElementTrees are accumulated

Hi, I try to diff some files in a company internal xml format via lxml, and write the diff as xml back to disk. I do a double diff: file1+file2, and file3+file2, and then I diff the two diff files again and write that back to disk. I'm not using the html diff, I do the works myself. In the end I do: etree.ElementTree(diff1).write("diff1.xml", pretty_print=True, xml_declaration=True) etree.ElementTree(diff2).write("diff2.xml", pretty_print=True, xml_declaration=True) result.addprevious(etree.PI('xml-stylesheet', 'type="text/xsl" href="uberdiff.xsl"')) resultTree = etree.ElementTree(result) resultTree.write("Udiff.xml", pretty_print=True, xml_declaration=True) On inspection, the root of diff2 contains the children of the root of diff1 + his own children, and Udiff contains the children of diff2 (including those of diff1!) and his own. I am trying to debug this for days, I can't really find it yet. I'm wondering if I'm leaking this myself or that there is a bug or internal thing in lxml that I do not understand yet. Please inspect my code on: https://github.com/Y3PP3R/difftooling I'm working with python 2.7.3 32bit on windows 7 x64 and the lxml binaries 2.3. from http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml Thank for any pointers in the right direction already. Jasper Timmer

Jasper Timmer, 29.07.2013 18:34:
Most likely not a bug in lxml. Without wading through your complete code base (you didn't point to a specific module and didn't provide a working example snippet to test), I'd suspect that your diffing code either doesn't create distinct trees for diff1 and diff2 (i.e. both Elements live in the same document), or you are deep copying stuff around and end up copying more than you wanted. In any case, describing your actual approach (not just the final result), so that we can more easily follow you, would most likely help you fix it yourself already. Stefan

Thanks Stefan, I made this mistake: http://stackoverflow.com/questions/1585247/python-optional-parameters in _get_diff in differs/diff_xml.py. I created the root node once and kept adding to it. I didn't know until I made the problem as small as possible and found that on the second call, rootNode was not empty. Because I thought I knew how python worked and lxml was the new thing for me, I assumed it was related to lxml. Sorry for that. Met vriendelijke groet, Jasper Timmer 2013/7/30 Stefan Behnel <stefan_ml@behnel.de>

Jasper Timmer, 29.07.2013 18:34:
Most likely not a bug in lxml. Without wading through your complete code base (you didn't point to a specific module and didn't provide a working example snippet to test), I'd suspect that your diffing code either doesn't create distinct trees for diff1 and diff2 (i.e. both Elements live in the same document), or you are deep copying stuff around and end up copying more than you wanted. In any case, describing your actual approach (not just the final result), so that we can more easily follow you, would most likely help you fix it yourself already. Stefan

Thanks Stefan, I made this mistake: http://stackoverflow.com/questions/1585247/python-optional-parameters in _get_diff in differs/diff_xml.py. I created the root node once and kept adding to it. I didn't know until I made the problem as small as possible and found that on the second call, rootNode was not empty. Because I thought I knew how python worked and lxml was the new thing for me, I assumed it was related to lxml. Sorry for that. Met vriendelijke groet, Jasper Timmer 2013/7/30 Stefan Behnel <stefan_ml@behnel.de>
participants (2)
-
Jasper Timmer
-
Stefan Behnel