etree.XML() nsmap gives partial namespace
I am using nsmap on etree.XML() to get the list of all the namespaces used in a given xml document. The following code i.e. <code-1> works fine which gives the expected output i.e. <output-1>. The nsmap contains all the namespace that are used in the document. One point to note here is - all the xml namespaces used in this document has been highlighted in the root element itself. <code-1> from lxml import etree xml_str_1 = '''<?xml version='1.0' encoding='UTF-8'?> <ns0:root xmlns:ns0="http://thisisroot.com/rootelem/" xmlns:ns1=" http://thisischild.com/childelem1/" xmlns:ns2=" http://thisischild.com/childelem2/" xmlns:ns3=" http://thisischild.com/childelem3/" xmlns:ns4=" http://thisischild.com/childelem4/"> <ns1:child1>child1</ns1:child1> <ns2:child2>child2</ns2:child2> <ns3:child3>child3</ns3:child3> <ns4:child4>child4</ns4:child4> </ns0:root>''' xml_1 = etree.XML(xml_str_1) ns_1 = xml_1.nsmap print "xml document = ", xml_str_1 print "namespace = ", ns_1 </code-1> <output-1> xml document = <?xml version='1.0' encoding='UTF-8'?> <ns0:root xmlns:ns0="http://thisisroot.com/rootelem/" xmlns:ns1=" http://thisischild.com/childelem1/" xmlns:ns2=" http://thisischild.com/childelem2/" xmlns:ns3=" http://thisischild.com/childelem3/" xmlns:ns4=" http://thisischild.com/childelem4/"> <ns1:child1>child1</ns1:child1> <ns2:child2>child2</ns2:child2> <ns3:child3>child3</ns3:child3> <ns4:child4>child4</ns4:child4> </ns0:root> namespace = {'ns0': 'http://thisisroot.com/rootelem/', 'ns1': ' http://thisischild.com/childelem1/', 'ns2': ' http://thisischild.com/childelem2/', 'ns3': ' http://thisischild.com/childelem3/', 'ns4': ' http://thisischild.com/childelem4/'} </output-1> In the following (<code-2>, I have constructed an xml document using etree.Element() and etree.SubElement() using the same namespace. I used etree.tostring() to get the xml document. One thing to note here is - the xml document that has been generated by etree.tostring() does not have all the namespaces in the root element. A namespace is introduced in an element as and when it is needed. When I used the xml document generated by etree.tostring() in the etree.XML(), the nsmap does not contain all the namespaces used in the xml document instead it contains only the namespace which is highlighted in the root element. <code-2> from lxml import etree root = '{http://thisisroot.com/rootelem/}root' c1 = '{http://thisischild.com/childelem1/}child1' c2 = '{http://thisischild.com/childelem2/}child2' c3 = '{http://thisischild.com/childelem3/}child3' c4 = '{http://thisischild.com/childelem4/}child4' e_root = etree.Element(root) etree.SubElement(e_root, c1).text = 'child1' etree.SubElement(e_root, c2).text = 'child2' etree.SubElement(e_root, c3).text = 'child3' etree.SubElement(e_root, c4).text = 'child4' xml_str_2 = etree.tostring(e_root, pretty_print=True, xml_declaration=True, encoding='UTF-8') xml_2 = etree.XML(xml_str_2) ns_2 = xml_2.nsmap print "xml document = ", xml_str_2 print "namespace = ", ns_2 </code-2> </output-2> xml document = <?xml version='1.0' encoding='UTF-8'?> <ns0:root xmlns:ns0="http://thisisroot.com/rootelem/"> <ns1:child1 xmlns:ns1="http://thisischild.com/childelem1/ ">child1</ns1:child1> <ns2:child2 xmlns:ns2="http://thisischild.com/childelem2/ ">child2</ns2:child2> <ns3:child3 xmlns:ns3="http://thisischild.com/childelem3/ ">child3</ns3:child3> <ns4:child4 xmlns:ns4="http://thisischild.com/childelem4/ ">child4</ns4:child4> </ns0:root> namespace = {'ns0': 'http://thisisroot.com/rootelem/'} </output-2> In the case of <code-2>, is there a way to get all the namespaces used in a given xml document?! Please suggest.. Thank you, Sangeeth
Sangeeth Saravanaraj, 03.06.2013 16:34:
I am using nsmap on etree.XML() to get the list of all the namespaces used in a given xml document. The following code i.e. <code-1> works fine which gives the expected output i.e. <output-1>. The nsmap contains all the namespace that are used in the document.
One point to note here is - all the xml namespaces used in this document has been highlighted in the root element itself.
<code-1> from lxml import etree
xml_str_1 = '''<?xml version='1.0' encoding='UTF-8'?> <ns0:root xmlns:ns0="http://thisisroot.com/rootelem/" xmlns:ns1=" http://thisischild.com/childelem1/" xmlns:ns2=" http://thisischild.com/childelem2/" xmlns:ns3=" http://thisischild.com/childelem3/" xmlns:ns4=" http://thisischild.com/childelem4/"> <ns1:child1>child1</ns1:child1> <ns2:child2>child2</ns2:child2> <ns3:child3>child3</ns3:child3> <ns4:child4>child4</ns4:child4> </ns0:root>'''
xml_1 = etree.XML(xml_str_1) ns_1 = xml_1.nsmap
print "xml document = ", xml_str_1 print "namespace = ", ns_1 </code-1>
<output-1> xml document = <?xml version='1.0' encoding='UTF-8'?> <ns0:root xmlns:ns0="http://thisisroot.com/rootelem/" xmlns:ns1=" http://thisischild.com/childelem1/" xmlns:ns2=" http://thisischild.com/childelem2/" xmlns:ns3=" http://thisischild.com/childelem3/" xmlns:ns4=" http://thisischild.com/childelem4/"> <ns1:child1>child1</ns1:child1> <ns2:child2>child2</ns2:child2> <ns3:child3>child3</ns3:child3> <ns4:child4>child4</ns4:child4> </ns0:root> namespace = {'ns0': 'http://thisisroot.com/rootelem/', 'ns1': ' http://thisischild.com/childelem1/', 'ns2': ' http://thisischild.com/childelem2/', 'ns3': ' http://thisischild.com/childelem3/', 'ns4': ' http://thisischild.com/childelem4/'} </output-1>
In the following (<code-2>, I have constructed an xml document using etree.Element() and etree.SubElement() using the same namespace. I used etree.tostring() to get the xml document. One thing to note here is - the xml document that has been generated by etree.tostring() does not have all the namespaces in the root element. A namespace is introduced in an element as and when it is needed. When I used the xml document generated by etree.tostring() in the etree.XML(), the nsmap does not contain all the namespaces used in the xml document instead it contains only the namespace which is highlighted in the root element.
<code-2> from lxml import etree
root = '{http://thisisroot.com/rootelem/}root' c1 = '{http://thisischild.com/childelem1/}child1' c2 = '{http://thisischild.com/childelem2/}child2' c3 = '{http://thisischild.com/childelem3/}child3' c4 = '{http://thisischild.com/childelem4/}child4'
e_root = etree.Element(root) etree.SubElement(e_root, c1).text = 'child1' etree.SubElement(e_root, c2).text = 'child2' etree.SubElement(e_root, c3).text = 'child3' etree.SubElement(e_root, c4).text = 'child4'
xml_str_2 = etree.tostring(e_root, pretty_print=True, xml_declaration=True, encoding='UTF-8') xml_2 = etree.XML(xml_str_2) ns_2 = xml_2.nsmap
print "xml document = ", xml_str_2 print "namespace = ", ns_2 </code-2>
</output-2> xml document = <?xml version='1.0' encoding='UTF-8'?> <ns0:root xmlns:ns0="http://thisisroot.com/rootelem/"> <ns1:child1 xmlns:ns1="http://thisischild.com/childelem1/ ">child1</ns1:child1> <ns2:child2 xmlns:ns2="http://thisischild.com/childelem2/ ">child2</ns2:child2> <ns3:child3 xmlns:ns3="http://thisischild.com/childelem3/ ">child3</ns3:child3> <ns4:child4 xmlns:ns4="http://thisischild.com/childelem4/ ">child4</ns4:child4> </ns0:root>
namespace = {'ns0': 'http://thisisroot.com/rootelem/'} </output-2>
In the case of <code-2>, is there a way to get all the namespaces used in a given xml document?!
Does this help? http://lxml.de/FAQ.html#how-can-i-find-out-which-namespace-prefixes-are-used... Stefan
participants (2)
-
Sangeeth Saravanaraj
-
Stefan Behnel