[Tutor] removing xml elements with ElementTree
Peter Otten
__peter__ at web.de
Wed Mar 20 09:32:19 EDT 2019
street.sweeper at mailworks.org wrote:
> An opportunity to work in Python, and the necessity of working with some
> XML too large to visualize, got me thinking about an answer Alan Gauld had
> written to me a few years ago
> (https://mail.python.org/pipermail/tutor/2015-June/105810.html). I have
> applied that information in this script, but I have another question :)
>
> Let's say I have an xml file like this:
>
> -------------- order.xml ----------------
>
> <salesorder>
> <customername>Bob</customername>
> <customerlocation>321 Main St</customerlocation>
> <saleslines>
> <salesline>
> <item>D20</item>
> <quantity>4</quantity>
> </salesline>
> <salesline>
> <item>CS211</item>
> <quantity>1</quantity>
> </salesline>
> <salesline>
> <item>BL5</item>
> <quantity>7</quantity>
> </salesline>
> <salesline>
> <item>AC400</item>
> <quantity>1</quantity>
> </salesline>
> </saleslines>
> </salesorder>
>
> ---------- end order.xml ----------------
>
> Items CS211 and AC400 are not valid items, and I want to remove their
> <salesline> nodes. I came up with the following (python 3.6.7 on linux):
>
> ------------ xml_delete_test.py --------------------
>
> import os
> import xml.etree.ElementTree as ET
>
> hd = os.path.expanduser('~')
> inputxml = os.path.join(hd,'order.xml')
> outputxml = os.path.join(hd,'fixed_order.xml')
>
> valid_items = ['D20','BL5']
>
> tree = ET.parse(inputxml)
> root = tree.getroot()
> saleslines = root.find('saleslines').findall('salesline')
> for e in saleslines[:]:
> if e.find('item').text not in valid_items:
> saleslines.remove(e)
>
> tree.write(outputxml)
>
> ---------- end xml_delete_test.py ------------------
>
> The above code runs without error, but simply writes the original file to
> disk. The desired output would be:
>
> -------------- fixed_order.xml ----------------
>
> <salesorder>
> <customername>Bob</customername>
> <customerlocation>321 Main St</customerlocation>
> <saleslines>
> <salesline>
> <item>D20</item>
> <quantity>4</quantity>
> </salesline>
> <salesline>
> <item>BL5</item>
> <quantity>7</quantity>
> </salesline>
> </saleslines>
> </salesorder>
>
> ---------- end fixed_order.xml ----------------
>
> What I find particularly confusing about the problem is that after running
> xml_delete_test.py in the Idle editor, if I go over to the shell and type
> saleslines, I can see that it's now a list of two elements. I run the
> following:
>
> for i in saleslines:
> print(i.find('item').text)
>
> and I see that it's D20 and BL5, my two valid items. Yet when I write
> tree out to the disk, it has the original four. Do I need to refresh tree
> somehow?
>
> Thanks!
First of all, thank you for this clear and complete problem description!
> saleslines = root.find('saleslines').findall('salesline')
Here findall()
returns a new list of matches which is completely independent of the element
tree. Therefore
> saleslines.remove(e)
will remove the element e from this indepent list, and only from that.
To remove an element from the tree you have to know its parent, and then
parent_element.remove(child_element)
will actually modify the tree.
In your case the parent is always <saleslines>, so you can restrict yourself
to its children:
saleslines = root.find('saleslines')
for e in saleslines.findall('salesline'):
if e.find('item').text not in valid_items:
saleslines.remove(e)
More information about the Tutor
mailing list