Separate the namespace URI and local name from the Python API
data:image/s3,"s3://crabby-images/14aaf/14aafd8c8002c91a2a2893ff2082fd8be305b3ef" alt=""
Hi, I would like to get the namespace URI and local part of an element’s name separately. I think I could manage with the C/Cython API, but I don’t see how to do it from Python. The best I can think of is: def split_ns(element): tag = element.tag if '}' in tag: ns, local = tag.split('}', 1) return ns[1:], local # remove '{' from ns else: return None, tag Is the above correct? (Ie. can the URI or local part ever contain '}'?) Is there a better way? These appear to work but using XPath for this seems overkill: etree.XPath('namespace-uri(.)') etree.XPath('local-name(.)') Regards, -- Simon Sapin
data:image/s3,"s3://crabby-images/141ff/141ff9c7360caba1e5daae3e7232ff131cc85a48" alt=""
Hi Simon, Am Sonntag, 17. Juni 2012, 16:18:45 schrieb Simon Sapin:
I would like to get the namespace URI and local part of an element’s name separately. I think I could manage with the C/Cython API, but I don’t see how to do it from Python. The best I can think of is:
def split_ns(element): tag = element.tag if '}' in tag: ns, local = tag.split('}', 1) return ns[1:], local # remove '{' from ns else: return None, tag
Is the above correct? (Ie. can the URI or local part ever contain '}'?) Is there a better way?
I had the same problem, but solved it with a regular expression. If you are only interested in the local name, try this: ------------- import re qname=re.compile("{(?P<ns>.*)}(?P<element>.*)") def localname(node): """Returns the local name part of node""" m = qname.search(node.tag) return m.groupdict().get("element") if m else node.tag # We assume, there is a variable root which is an element # root = '{http://docbook.org/ns/docbook}book' print localname(root) # returns "book" ------------- The above localname() function works with and without a namespace. Either way, it returns the local name of the element name. If you need both, URI and its local name, use the "qname" variable--or modify the above function to return both parts. If there is a match, the qname.groupdict() returns a dictionary like this: {'ns': 'http://docbook.org/ns/docbook', 'element': 'book'} With a more clever regular expression, you can make the {...} part optional and still get a match: qname=re.compile("({(?P<ns>.*)})?(?P<element>.*)") m=qname.search("chapter") # just as an example m.groupdict() # There is no namespace, therefor 'ns' is empty! {'ns': None, 'element': 'chapter'} I don't think, there is a method to return the local name in the lxml API (at least, I haven't seen one). Hope that helps. -- Gruß/Regards Thomas Schraitle
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Simon Sapin, 17.06.2012 16:18:
I would like to get the namespace URI and local part of an element’s name separately. I think I could manage with the C/Cython API, but I don’t see how to do it from Python. The best I can think of is:
def split_ns(element): tag = element.tag if '}' in tag: ns, local = tag.split('}', 1) return ns[1:], local # remove '{' from ns else: return None, tag
Is the above correct? (Ie. can the URI or local part ever contain '}'?) Is there a better way?
You can use the QName class: >>> from lxml import etree >>> qn = etree.QName("lname") >>> qn.localname u'lname' >>> qn.namespace >>> qn = etree.QName("{ns}lname") >>> qn.localname u'lname' >>> qn.namespace u'ns' >>> qn = etree.QName(etree.Element("{ns}lname")) >>> qn.localname u'lname' >>> qn.namespace u'ns' It does a bit more than strictly required, but that should rarely matter in practice. I guess this would make a good addition to the tutorial section on namespaces. Stefan
data:image/s3,"s3://crabby-images/14aaf/14aafd8c8002c91a2a2893ff2082fd8be305b3ef" alt=""
Le 17/06/2012 19:31, Stefan Behnel a écrit :
>>> qn = etree.QName(etree.Element("{ns}lname")) >>> qn.localname u'lname' >>> qn.namespace u'ns'
Exactly what I was looking for. Thanks! The parameter is named text_or_uri_or_element in the source, but the docstring has text_or_uri and does not mention elements for the parameter value. Internally it looks like it still goes through the .tag property: build a string with {} and then split it. Oh well, at least it should be correct. -- Simon Sapin
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Simon Sapin, 17.06.2012 19:46:
Le 17/06/2012 19:31, Stefan Behnel a écrit :
>>> qn = etree.QName(etree.Element("{ns}lname")) >>> qn.localname u'lname' >>> qn.namespace u'ns'
Exactly what I was looking for. Thanks!
The parameter is named text_or_uri_or_element in the source, but the docstring has text_or_uri and does not mention elements for the parameter value.
Hmm, right, thanks for catching that.
Internally it looks like it still goes through the .tag property: build a string with {} and then split it. Oh well, at least it should be correct.
Yes, there's some overhead involved. That's easy enough to change if it proves necessary, but I'm yet to see a performance critical use case. Stefan
data:image/s3,"s3://crabby-images/14aaf/14aafd8c8002c91a2a2893ff2082fd8be305b3ef" alt=""
Le 17/06/2012 20:54, Stefan Behnel a écrit :
a string with {} and then split it. Oh well, at least it should be correct. Yes, there's some overhead involved. That's easy enough to change if it
Internally it looks like it still goes through the .tag property: build proves necessary, but I'm yet to see a performance critical use case.
Yes, it’s probably not critical. I especially wanted something that is always correct. If QName is not, it’s not my fault ;) -- Simon Sapin
participants (3)
-
Simon Sapin
-
Stefan Behnel
-
Thomas Schraitle