Mailman 3 lmxl incremental XML serialisation repeats namespaces - lxml - The Python XML Toolkit

Nov. 6, 2018

      Hello,

I am currently serializing some largish XML files in Python with lxml. I want to use the incremental writer for that. My XML format heavily relies on namespaces and attributes. When I run the following code

    from io import BytesIO

    from lxml import etree

    sink = BytesIO()

    nsmap = {
        'test': 'http://test.org',
        'foo': 'http://foo.org',
        'bar': 'http://bar.org',
    }

    with etree.xmlfile(sink) as xf:
        with xf.element("test:testElement", nsmap=nsmap):
            name = etree.QName(nsmap["foo"], "fooElement")
            elem = etree.Element(name)

            xf.write(elem)

    print(sink.getvalue().decode('utf-8'))

then I get the following output:

    <test:testElement xmlns:bar="http://bar.org" 
     xmlns:foo="http://foo.org" 
     xmlns:test="http://test.org">
        <ns0:fooElement xmlns:ns0="http://foo.org"/>
    </test:testElement>

As you can see, the namespace for `foo` is repeated and not my prefix:

    <ns0:fooElement xmlns:ns0="http://foo.org"/>

How do I make it so lxml only adds the namespace in the root and children use the correct prefix from there? I think I need to use `etree.Element`, as I need to add some attributes to the node.

What did not work:

1) Using `register_namespace`

    for prefix, uri in nsmap.items():
        etree.register_namespace(prefix, uri)

That still repeats, but makes the prefix correct. I do not like it too much, as it changes stuff globally.

2) Specifying the `nsmap` in the element:

    elem = etree.Element(name, nsmap=nsmap)

yields

    <foo:fooElement xmlns:bar="http://bar.org" 
     xmlns:foo="http://foo.org" 
     xmlns:test="http://test.org"/>

for the `fooElement`.

I also looked in the documentation and source code of lxml, but it is Cython so really hard to read and search. The context manager of `xf.element` does not return the element. e.g. 

    with xf.element('foo:fooElement') as e:
        print(e)

prints `None`.

I already asked on stackoverflow (https://stackoverflow.com/questions/53083828/lmxl-incremental-xml-serialisat...), but did not receive a suitable answer.

Regards,

Jan

lmxl incremental XML serialisation repeats namespaces

Jan-Christoph Klie

Charlie Clark

Stefan Behnel

Burak Arslan

Charlie Clark

Stefan Behnel

Burak Arslan

tags

participants (4)