replacing xml elements with other elements using lxml
Ultrus
owntheweb at gmail.com
Wed Aug 29 19:04:15 EDT 2007
Stefan,
I'm honored by your response.
You are correct about the bad xml. I attempted to shorten the xml for
this example as there are other tags unrelated to this issue in the
mix. Based on your feedback, I was able to make following fully
functional code using some different techniques:
from lxml import etree
from StringIO import StringIO
import random
sourceXml = "\
<theroot>\
<contents>Stefan's fortune cookie:</contents>\
<random>\
<item>\
<random>\
<item>\
<contents>You will always know love.</contents>\
</item>\
<item>\
<contents>You will spend it all in one place.</contents>\
</item>\
</random>\
</item>\
<item>\
<contents>Your life comes with a lifetime warrenty.</contents>\
</item>\
</random>\
<contents>The end.</contents>\
</theroot>"
parser = etree.XMLParser(ns_clean=True, recover=True,
remove_blank_text=True, remove_comments=True)
tree = etree.parse(StringIO(sourceXml), parser)
xml = tree.getroot()
def reduceRandoms(xml):
for elem in xml:
if elem.tag == "random":
elem.getparent().replace(elem, random.choice(elem)[0])
reduceRandoms(xml)
reduceRandoms(xml)
for elem in xml:
print elem.tag, ":", elem.text
One challenge that I face now is that I can only replace a parent
element with a single element. This isn't a problem if an <item>
element only has 1 <contents> element, or just 1 <random> element
(this works above). However, if <item> elements have more than one
child element such as a <contents> element, followed by a <random>
element (like children of <theroot>), only the first element is used.
Any thoughts on how to replace+append after the replaced element, or
clear+append multiple elements to the cleared position?
Thanks again :)
More information about the Python-list
mailing list