[Tutor] removing xml elements with ElementTree

Peter Otten __peter__ at web.de
Wed Mar 20 09:32:19 EDT 2019


street.sweeper at mailworks.org wrote:

> An opportunity to work in Python, and the necessity of working with some
> XML too large to visualize, got me thinking about an answer Alan Gauld had
> written to me a few years ago
> (https://mail.python.org/pipermail/tutor/2015-June/105810.html).  I have
> applied that information in this script, but I have another question :)
> 
> Let's say I have an xml file like this:
> 
> -------------- order.xml ----------------
> 
> <salesorder>
>     <customername>Bob</customername>
>     <customerlocation>321 Main St</customerlocation>
>     <saleslines>
>         <salesline>
>             <item>D20</item>
>             <quantity>4</quantity>
>         </salesline>
>         <salesline>
>             <item>CS211</item>
>             <quantity>1</quantity>
>         </salesline>
>         <salesline>
>             <item>BL5</item>
>             <quantity>7</quantity>
>         </salesline>
>         <salesline>
>             <item>AC400</item>
>             <quantity>1</quantity>
>         </salesline>
>     </saleslines>
> </salesorder>
> 
> ---------- end order.xml ----------------
> 
> Items CS211 and AC400 are not valid items, and I want to remove their
> <salesline> nodes.  I came up with the following (python 3.6.7 on linux):
> 
> ------------ xml_delete_test.py --------------------
> 
> import os
> import xml.etree.ElementTree as ET
> 
> hd = os.path.expanduser('~')
> inputxml = os.path.join(hd,'order.xml')
> outputxml = os.path.join(hd,'fixed_order.xml')
> 
> valid_items = ['D20','BL5']
> 
> tree = ET.parse(inputxml)
> root = tree.getroot()
> saleslines = root.find('saleslines').findall('salesline')
> for e in saleslines[:]:
>     if e.find('item').text not in valid_items:
>         saleslines.remove(e)
> 
> tree.write(outputxml)
> 
> ---------- end xml_delete_test.py ------------------
> 
> The above code runs without error, but simply writes the original file to
> disk.  The desired output would be:
> 
> -------------- fixed_order.xml ----------------
> 
> <salesorder>
>     <customername>Bob</customername>
>     <customerlocation>321 Main St</customerlocation>
>     <saleslines>
>         <salesline>
>             <item>D20</item>
>             <quantity>4</quantity>
>         </salesline>
>         <salesline>
>             <item>BL5</item>
>             <quantity>7</quantity>
>         </salesline>
>         </saleslines>
> </salesorder>
> 
> ---------- end fixed_order.xml ----------------
> 
> What I find particularly confusing about the problem is that after running
> xml_delete_test.py in the Idle editor, if I go over to the shell and type
> saleslines, I can see that it's now a list of two elements.  I run the
> following:
> 
> for i in saleslines:
> print(i.find('item').text)
> 
> and I see that it's D20 and BL5, my two valid items.  Yet when I write
> tree out to the disk, it has the original four.  Do I need to refresh tree
> somehow?
> 
> Thanks!

First of all, thank you for this clear and complete problem description!

> saleslines = root.find('saleslines').findall('salesline')

Here findall()

returns a new list of matches which is completely independent of the element 
tree. Therefore

>         saleslines.remove(e)

will remove the element e from this indepent list, and only from that.
To remove an element from the tree you have to know its parent, and then

parent_element.remove(child_element)

will actually modify the tree. 

In your case the parent is always <saleslines>, so you can restrict yourself 
to its children:

saleslines = root.find('saleslines')
for e in saleslines.findall('salesline'):
    if e.find('item').text not in valid_items:
        saleslines.remove(e)




More information about the Tutor mailing list