
Jasper Timmer, 29.07.2013 18:34:
I try to diff some files in a company internal xml format via lxml, and write the diff as xml back to disk. I do a double diff: file1+file2, and file3+file2, and then I diff the two diff files again and write that back to disk. I'm not using the html diff, I do the works myself.
In the end I do: etree.ElementTree(diff1).write("diff1.xml", pretty_print=True, xml_declaration=True) etree.ElementTree(diff2).write("diff2.xml", pretty_print=True, xml_declaration=True) result.addprevious(etree.PI('xml-stylesheet', 'type="text/xsl" href="uberdiff.xsl"')) resultTree = etree.ElementTree(result) resultTree.write("Udiff.xml", pretty_print=True, xml_declaration=True)
On inspection, the root of diff2 contains the children of the root of diff1
- his own children, and Udiff contains the children of diff2 (including
those of diff1!) and his own. I am trying to debug this for days, I can't really find it yet.
I'm wondering if I'm leaking this myself or that there is a bug or internal thing in lxml that I do not understand yet. Please inspect my code on: https://github.com/Y3PP3R/difftooling
Most likely not a bug in lxml. Without wading through your complete code base (you didn't point to a specific module and didn't provide a working example snippet to test), I'd suspect that your diffing code either doesn't create distinct trees for diff1 and diff2 (i.e. both Elements live in the same document), or you are deep copying stuff around and end up copying more than you wanted.
In any case, describing your actual approach (not just the final result), so that we can more easily follow you, would most likely help you fix it yourself already.
Stefan