data:image/s3,"s3://crabby-images/776d2/776d27937dcc62255199c99b76119d7f75ea96e4" alt=""
Hi,
I have a XML document that has namespaces in it. I want to use the find* methods to select elements but it's looks like it is not possible without specifying namespaces explicitly to every call. Is this true? Is this by design? This seems very burdensome to do so when namespaces are included in the XML document. It would be really nice if the namespaces in the XML document could be considered.
IMHO the way to go is to just predefine a prefix namespace mapping that fits the prefixes you use in your find/xpath expressions, once. Then simply use this as your xpath/find argument. E.g.
from lxml import etree xml = b"""<?xml version="1.0" encoding="UTF-8"?> ... <root xmlns="urn:defaultnamespace"> ... <Timestamp>2025-01-09T17:46:08.766Z</Timestamp> ... <namespaced xmlns:ns="urn:subns"> ... <rootdefault>text</rootdefault> ... <ns:element>element</ns:element> ... </namespaced> ... </root> ... """
root = etree.fromstring(xml) # subsequently use this ns mapping: namespaces = {"default": "urn:defaultnamespace", "sub": "urn:subns"} root.find("./default:namespaced/sub:element", namespaces=namespaces) <Element {urn:subns}element at 0x7e14867f4dc0>
or
root.xpath("./default:namespaced/sub:element", namespaces=namespaces) [<Element {urn:subns}element at 0x7e14867f4dc0>]
Not burdensome in my book. ;-) Note that I deliberately deviate from the prefixes used in the original docs here, just for illustration. So you don't really need to know about the prefixes used in the document you want to process beforehand - but of course you need to know the qualified names for your find/xpath expressions (i.e. "{namespace-uri}element-name" in Clark notation). For XPath, you can't use an empty prefix; see also https://lxml.de/ xpathxslt.html#namespaces-and-prefixes. You might even want to "precompile" xpath expressions using etree.XPath, like
namespaces = {"default": "urn:defaultnamespace", "sub": "urn:subns"} find_element = etree.XPath("./default:namespaced/sub:element", namespaces=namespaces) find_element(root) [<Element {urn:subns}element at 0x7e14868127c0>]
If you really wanted to you could do some functools.partial currying to create your own namespace map-aware find functions:
import functools find = functools.partial(root.__class__.find, namespaces={None: "urn:defaultnamespace", "ns": "urn:subns"}) find(root, "./namespaced/ns:element") <Element {urn:subns}element at 0x7e148686cb80>
If you wanted to use unqualified names you could do s.th. like
root.xpath("./*[local-name()='namespaced']/*[local-name()='element']") [<Element {urn:subns}element at 0x7e14867f4dc0>]
But I wouldn't advise it: it's clunky in XPath 1.0 anyhow and has performance implications. I'd just go with the simplest option i.e. define and reuse a namespaces dict. Best regards, Holger