Is there a way to merge two XML files via Python?

Stefan Behnel stefan_ml at behnel.de
Thu Jan 12 11:39:06 EST 2012


J, 12.01.2012 17:04:
> This is more a theory exercise and something I'm trying to figure out,
> and this is NOT a homework assignment...
> 
> I'm trying to make a tool I use at work more efficient :)
> 
> So this is at test tool that generates an XML file as it's output that
> is eventually used by a web service to display test results and system
> information.
> 
> The problem is that the testing is broken down into to different runs:
> Functional and Automated where the Functional tests are all manual,
> then the automated tests are run separately, usually overnight.
> 
> Each of those test runs generates essentially an identical XML file.
> What I want to learn is a way to merge them.

Ok - how large are these files? (i.e., do they easily fit into memory?)


> In abstract terms, the idea is essentially to diff the two files
> creating a patch and then use that patch to merge the two files into a
> single XML file.

I wouldn't go through patch. If they fit into memory, just load both, merge
one into the other eliminating duplicates, and save that.

Or rather, load just one and process the other one incrementally using
ElementTree's iterparse().


> SO what I was hoping I could get pointers on from those of you who are
> experienced in using Python with XML is what python libraries or means
> are there for working with XML files specifically, and how easy or
> difficult would this be?

Depends on how easy it is to recognise duplicates in your specific data
format. Once you've managed to do that, the rest is trivial.


> I'm also doing research on my own in my spare time on this, but I also
> wanted to ask here to get the opinion of developers who are more
> experienced in working with XML than I am.

I recommend looking at the stdlib xml.etree.ElementTree module or the
external lxml package (which contains the ElementTree compatible lxml.etree
module). The latter will (likely) make things easier due to full XPath
support and some other goodies, but ElementTree is also quite quick and
easy to use by itself.

Stefan




More information about the Python-list mailing list