![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Stefan Behnel, 01.11.2012 22:46:
James Housden, 30.10.2012 20:35:
I have a large xml file that I need to modify and then store as a new xml file.
The file has a structure similar to <root> <header> <txt>header txt</txt> </header> <record> <field1>1.0</field1> <subrecord> <field2>A1</field2> <field3>C1</field3> <subrecord> </record> <record> <field1>1.0</field1> <subrecord> <field2>A2</field2> <field3>C3</field3> <subrecord> </record> <record> <field1>1.0</field1> <subrecord> <field2>A4</field2> <field3>B</field3> <subrecord> </record> </root>
I would like to modify the contents of the field3 tags.
Now, due to the file size, I cannot load the complete document into memory and so I intend to use 'iterparse'. Traversing the document and updating the fields is no problem. What I am not sure about is how to write the modified data to a new xml file. The root tag is only complete when I have processed the complete file. What I need to do is write the start of the root tag (<root>) then write the header and the records and finally the end of root tag (</root>). Is there functionality in lxml to do this or should I use standard python writes for the initial <root> and final >/root>?
It's certainly easiest to just write out the root tag yourself. Take care of encodings in that case - as long as you only use UTF-8, you should be fine. Otherwise, you also have to write out an appropriate XML declaration before the root element and properly get the serialised XML elements into the file.
Actually, I'd love to see someone implement a magic API like this: # open an "XMLFile" object that knows about XML serialisation with xmlfile("somefile.xml", encoding='utf-8') as xf: # generate an element (the root element) with xf.Element('root-tag') as root_element: # generate content, e.g. through iterparse for element in generate_some_elements(): # serialise generated elements into the XML file xf.write(element) That looks like it should be totally trivial to do, but would make the above use case way simpler and safer. Stefan