[Tutor] removing xml elements with ElementTree

street.sweeper at mailworks.org street.sweeper at mailworks.org
Wed Mar 20 01:06:00 EDT 2019


An opportunity to work in Python, and the necessity of working with some XML too large to visualize, got me thinking about an answer Alan Gauld had written to me a few years ago (https://mail.python.org/pipermail/tutor/2015-June/105810.html).  I have applied that information in this script, but I have another question :)

Let's say I have an xml file like this:

-------------- order.xml ----------------

<salesorder>
    <customername>Bob</customername>
    <customerlocation>321 Main St</customerlocation>
    <saleslines>
        <salesline>
            <item>D20</item>
            <quantity>4</quantity>
        </salesline>
        <salesline>
            <item>CS211</item>
            <quantity>1</quantity>
        </salesline>
        <salesline>
            <item>BL5</item>
            <quantity>7</quantity>
        </salesline>
        <salesline>
            <item>AC400</item>
            <quantity>1</quantity>
        </salesline>
    </saleslines>
</salesorder>

---------- end order.xml ----------------

Items CS211 and AC400 are not valid items, and I want to remove their <salesline> nodes.  I came up with the following (python 3.6.7 on linux):

------------ xml_delete_test.py --------------------

import os
import xml.etree.ElementTree as ET

hd = os.path.expanduser('~')
inputxml = os.path.join(hd,'order.xml')
outputxml = os.path.join(hd,'fixed_order.xml')

valid_items = ['D20','BL5']

tree = ET.parse(inputxml)
root = tree.getroot()
saleslines = root.find('saleslines').findall('salesline')
for e in saleslines[:]:
    if e.find('item').text not in valid_items:
        saleslines.remove(e)

tree.write(outputxml)

---------- end xml_delete_test.py ------------------

The above code runs without error, but simply writes the original file to disk.  The desired output would be:

-------------- fixed_order.xml ----------------

<salesorder>
    <customername>Bob</customername>
    <customerlocation>321 Main St</customerlocation>
    <saleslines>
        <salesline>
            <item>D20</item>
            <quantity>4</quantity>
        </salesline>
        <salesline>
            <item>BL5</item>
            <quantity>7</quantity>
        </salesline>
        </saleslines>
</salesorder>

---------- end fixed_order.xml ----------------

What I find particularly confusing about the problem is that after running xml_delete_test.py in the Idle editor, if I go over to the shell and type saleslines, I can see that it's now a list of two elements.  I run the following:

for i in saleslines:
	print(i.find('item').text)

and I see that it's D20 and BL5, my two valid items.  Yet when I write tree out to the disk, it has the original four.  Do I need to refresh tree somehow?

Thanks!


More information about the Tutor mailing list