Frederik Elwert schrieb am 09.06.2015 um 12:06:
I want to write a very large XML file to disc. Since I ran into memory issues using the regular ElementTree.write() method, I switched to using etree.xmlfile. Generally, it works quite well, but I ran into two issues. Here’s my test code:
----8<----
from lxml import etree
P_DATA = '{http://www.dspin.de/data}' P_TEXT = '{http://www.dspin.de/data/textcorpus}'
with etree.xmlfile('test.xml', encoding='utf-8') as xf: with xf.element(P_DATA + 'D-Spin', nsmap={None: 'http://www.dspin.de/data'}): with xf.element(P_TEXT + 'TextCorpus', lang='de', nsmap={None: 'http://www.dspin.de/data/textcorpus'}): element = etree.Element(P_TEXT + 'tokens', nsmap={None: 'http://www.dspin.de/data/textcorpus'}) element2 = etree.SubElement(element, P_TEXT + 'token') xf.write(element, pretty_print=True)
---->8----
And here’s the output:
----8<---- <D-Spin xmlns="http://www.dspin.de/data"><TextCorpus xmlns="http://www.dspin.de/data/textcorpus" lang="de"><tokens xmlns="http://www.dspin.de/data/textcorpus"> <token/> </tokens> </TextCorpus></D-Spin> ---->8----
Now my questions are:
1. I had to add an nsmap argument to the creation of "element" in order to prevent an "ns0:" prefix in the output. But this lead to a duplication of the declaration of the default namespace 'http://www.dspin.de/data/textcorpus' on both <TextCorpus> and <tokens>.
Since the generation of the Elements that I write to the xmlfile happens somewhere else in the real code, it is a bit cumbersome to add nsmaps all over the place. And even then, I have the duplicated namespace declaration. So ideally I’d like xf.write() to be aware of the current namespace map defined by the xf.element. Is that possible?
Yes, that's a known issue currently. It's not easy to fix because when serialising subtrees, the serialiser state is essentially blank and doesn't know about previously written elements. I guess this could be worked around by faking a new parent element with all parent namespaces for the element that is being serialised. Not great, but might still work. Pull requests welcome.
2. I can pass "pretty_print=True" to xf.write(), but it naturally only affects those sub-trees. Is it possible to pretty-print the elements generated by xf.element() as well? Maybe it would be nice to be able to pass pretty_print to etree.xmlfile() itself?
You can get a poor-human's slightly better pretty-printing by doing what you do above and additionally calling xf.write("\n") after each opening and closing element() block. While I would accept patches that implement a "pretty_print" flag for xmlfile() itself, as you proposed, I don't think it's going to be easy to make it work "as expected". Stefan