[lxml] Re: XML namespaces are not propagated over from the ancestor elements when using find* methods

Jan. 13, 2025

      Hi,
...
I have a XML document that has namespaces in it.  I want to use the find*
methods to select elements but it's looks like it is not possible without
specifying namespaces explicitly to every call.  Is this true?  Is this by
design?  This seems very burdensome to do so when namespaces are included
in the XML document.  It would be really nice if the namespaces in the XML
document could be considered.
IMHO the way to go is to just predefine a prefix namespace mapping that fits the
prefixes you use in your find/xpath expressions, once. Then simply use this as
your xpath/find argument. E.g.
...
...
...
from lxml import etree
xml = b"""<?xml version="1.0" encoding="UTF-8"?>
... <root xmlns="urn:defaultnamespace">
...     <Timestamp>2025-01-09T17:46:08.766Z</Timestamp>
...     <namespaced xmlns:ns="urn:subns">
...         <rootdefault>text</rootdefault>
...         <ns:element>element</ns:element>
...     </namespaced>
... </root>
... """
...
...
...
root = etree.fromstring(xml)
# subsequently use this ns mapping:
namespaces = {"default": "urn:defaultnamespace", "sub": "urn:subns"}
root.find("./default:namespaced/sub:element", namespaces=namespaces)
<Element {urn:subns}element at 0x7e14867f4dc0>
or
...
...
...
root.xpath("./default:namespaced/sub:element", namespaces=namespaces)
[<Element {urn:subns}element at 0x7e14867f4dc0>]
Not burdensome in my book. ;-)

Note that I deliberately deviate from the prefixes used in the original docs
here, just for illustration.
So you don't really need to know about the prefixes used in the document you
want to process beforehand - but of course you need to know the qualified names
for your find/xpath expressions (i.e. "{namespace-uri}element-name" in Clark
notation).

For XPath, you can't use an empty prefix; see also https://lxml.de/
xpathxslt.html#namespaces-and-prefixes.

You might even want to "precompile" xpath expressions using etree.XPath,
like
...
...
...
namespaces = {"default": "urn:defaultnamespace", "sub": "urn:subns"}
find_element = etree.XPath("./default:namespaced/sub:element",
namespaces=namespaces)
find_element(root)
[<Element {urn:subns}element at 0x7e14868127c0>]
If you really wanted to  you could do some functools.partial currying to
create your own namespace map-aware find functions:
...
...
...
import functools
find = functools.partial(root.__class__.find, namespaces={None:
"urn:defaultnamespace", "ns": "urn:subns"})
find(root, "./namespaced/ns:element")
<Element {urn:subns}element at 0x7e148686cb80>
If you wanted to use unqualified names you could do s.th. like
...
...
...
root.xpath("./*[local-name()='namespaced']/*[local-name()='element']")
[<Element {urn:subns}element at 0x7e14867f4dc0>]
But I wouldn't advise it: it's clunky in XPath 1.0 anyhow and has performance
implications.

I'd just go with the simplest option i.e. define and reuse a namespaces dict.

Best regards,
Holger

[lxml] Re: XML namespaces are not propagated over from the ancestor elements when using find* methods

jholg＠gmx.de