xml.etree and namespaces -- why?
Jon Ribbens
jon+usenet at unequivocal.eu
Wed Oct 19 12:15:24 EDT 2022
On 2022-10-19, Robert Latest <boblatest at yahoo.com> wrote:
> If the XML input has namespaces, tags and attributes with prefixes
> in the form prefix:sometag get expanded to {uri}sometag where the
> prefix is replaced by the full URI.
>
> Which means that given an Element e, I cannot directly access its attributes
> using e.get() because in order to do that I need to know the URI of the
> namespace.
That's because you *always* need to know the URI of the namespace,
because that's its only meaningful identifier. If you assume that a
particular namespace always uses the same prefix then your code will be
completely broken. The following two pieces of XML should be understood
identically:
<svg xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape">
<g inkscape:label="Ebene 1" inkscape:groupmode="layer" id="layer1">
and:
<svg xmlns:epacskni="http://www.inkscape.org/namespaces/inkscape">
<g epacskni:label="Ebene 1" epacskni:groupmode="layer" id="layer1">
So you can see why e.get('inkscape:label') cannot possibly work, and why
e.get('{http://www.inkscape.org/namespaces/inkscape}label') makes sense.
The xml.etree author obviously knew that this was cumbersome, and
hence you can do something like:
namespaces = {'inkspace': 'http://www.inkscape.org/namespaces/inkscape'}
element = root.find('inkspace:foo', namespaces)
which will work for both of the above pieces of XML.
But unfortunately as far as I can see nobody's thought about doing the
same for attributes rather than tags.
More information about the Python-list
mailing list