lmxl incremental XML serialisation repeats namespaces
data:image/s3,"s3://crabby-images/27d4f/27d4f82eee9aafddcba6fa3edfa59dfed70cd589" alt=""
Hello, I am currently serializing some largish XML files in Python with lxml. I want to use the incremental writer for that. My XML format heavily relies on namespaces and attributes. When I run the following code from io import BytesIO from lxml import etree sink = BytesIO() nsmap = { 'test': 'http://test.org', 'foo': 'http://foo.org', 'bar': 'http://bar.org', } with etree.xmlfile(sink) as xf: with xf.element("test:testElement", nsmap=nsmap): name = etree.QName(nsmap["foo"], "fooElement") elem = etree.Element(name) xf.write(elem) print(sink.getvalue().decode('utf-8')) then I get the following output: <test:testElement xmlns:bar="http://bar.org" xmlns:foo="http://foo.org" xmlns:test="http://test.org"> <ns0:fooElement xmlns:ns0="http://foo.org"/> </test:testElement> As you can see, the namespace for `foo` is repeated and not my prefix: <ns0:fooElement xmlns:ns0="http://foo.org"/> How do I make it so lxml only adds the namespace in the root and children use the correct prefix from there? I think I need to use `etree.Element`, as I need to add some attributes to the node. What did not work: 1) Using `register_namespace` for prefix, uri in nsmap.items(): etree.register_namespace(prefix, uri) That still repeats, but makes the prefix correct. I do not like it too much, as it changes stuff globally. 2) Specifying the `nsmap` in the element: elem = etree.Element(name, nsmap=nsmap) yields <foo:fooElement xmlns:bar="http://bar.org" xmlns:foo="http://foo.org" xmlns:test="http://test.org"/> for the `fooElement`. I also looked in the documentation and source code of lxml, but it is Cython so really hard to read and search. The context manager of `xf.element` does not return the element. e.g. with xf.element('foo:fooElement') as e: print(e) prints `None`. I already asked on stackoverflow (https://stackoverflow.com/questions/53083828/lmxl-incremental-xml-serialisat...), but did not receive a suitable answer. Regards, Jan
data:image/s3,"s3://crabby-images/863b1/863b1190bbdaf32564c8b302dc468286f365d9bb" alt=""
Am .11.2018, 11:54 Uhr, schrieb Jan-Christoph Klie <jck@mrklie.com>:
I don't think you can but it also makes no difference to the XML. FWIW I always use "{%s}" formatting for handling namespaces. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, Jan-Christoph Klie schrieb am 06.11.18 um 11:54:
That's not currently supported. It could be, but it's not easy, since the incremental serialiser is separate from the normal Element serialiser, and both are used here, and have their own separate namespace handling. But as Charlie Clark already said, it doesn't matter. Any namespace aware XML parser will happily handle these redundant declarations.
You could selectively declare only the namespaces you need. But again, it really doesn't matter. XML parsers will be happy with this.
At that point, the element has already been serialised, so there is nothing left to do with it. Stefan
data:image/s3,"s3://crabby-images/7c430/7c430f56c7c1e3f3d8622db4a925310b6455aa6b" alt=""
hey, On 06/11/18 13:54, Jan-Christoph Klie wrote:
why do you mix two subsystems? the following seems to work with etree.xmlfile(sink) as xf: with xf.element("test:testElement", nsmap=nsmap): name = etree.QName(nsmap["foo"], "fooElement") with xf.element(name): pass output: <test:testElement xmlns:bar="http://bar.org" xmlns:foo="http://foo.org" xmlns:test="http://test.org"> <foo:fooElement></foo:fooElement> </test:testElement> hth, burak
data:image/s3,"s3://crabby-images/863b1/863b1190bbdaf32564c8b302dc468286f365d9bb" alt=""
Am .11.2018, 11:54 Uhr, schrieb Jan-Christoph Klie <jck@mrklie.com>:
I don't think you can but it also makes no difference to the XML. FWIW I always use "{%s}" formatting for handling namespaces. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, Jan-Christoph Klie schrieb am 06.11.18 um 11:54:
That's not currently supported. It could be, but it's not easy, since the incremental serialiser is separate from the normal Element serialiser, and both are used here, and have their own separate namespace handling. But as Charlie Clark already said, it doesn't matter. Any namespace aware XML parser will happily handle these redundant declarations.
You could selectively declare only the namespaces you need. But again, it really doesn't matter. XML parsers will be happy with this.
At that point, the element has already been serialised, so there is nothing left to do with it. Stefan
data:image/s3,"s3://crabby-images/7c430/7c430f56c7c1e3f3d8622db4a925310b6455aa6b" alt=""
hey, On 06/11/18 13:54, Jan-Christoph Klie wrote:
why do you mix two subsystems? the following seems to work with etree.xmlfile(sink) as xf: with xf.element("test:testElement", nsmap=nsmap): name = etree.QName(nsmap["foo"], "fooElement") with xf.element(name): pass output: <test:testElement xmlns:bar="http://bar.org" xmlns:foo="http://foo.org" xmlns:test="http://test.org"> <foo:fooElement></foo:fooElement> </test:testElement> hth, burak
participants (4)
-
Burak Arslan
-
Charlie Clark
-
Jan-Christoph Klie
-
Stefan Behnel