[Tutor] XML parsing

Thu Mar 29 22:03:37 EDT 2018

On Thu, Mar 29, 2018 at 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote:

>
>
> On Thu, Mar 29, 2018 at 3:41 PM, Peter Otten <__peter__ at web.de> wrote:
>
>> Asif Iqbal wrote:
>>
>> > On Thu, Mar 29, 2018 at 3:56 AM, Peter Otten <__peter__ at web.de> wrote:
>> >
>> >> Asif Iqbal wrote:
>> >>
>> >> > I am trying to extract all the *template-name*s, but no success yet
>> >> >
>> >> > Here is a sample xml file
>> >> >
>> >> > <collection xmlns:y="http://tail-f.com/ns/rest">
>> >> >   <template-metadata xmlns="http://networks.com/nms">
>> >> >     <template-name>ALLFLEX-BLOOMINGTON</template-name>
>> >> >     <type>post-staging</type>
>> >> >     <device-type>full-mesh</device-type>
>> >> >     <provider-tenant>ALLFLEX</provider-tenant>
>> >> >     <subscription xmlns="http://networks.com/nms">
>> >> >       <solution-tier>advanced-plus</solution-tier>
>> >> >       <bandwidth>1000</bandwidth>
>> >> >       <is-analytics-enabled>true</is-analytics-enabled>
>> >> >       <is-primary>true</is-primary>
>> >> >     </subscription>
>> >> > ....
>> >> > </collection>
>> >> >
>> >> > with open('/tmp/template-metadata') as f:
>> >> >     import xml.etree.ElementTree as ET
>> >> >     root = ET.fromstring(f.read())
>> >> >
>> >> > print len(root)
>> >> > print root[0][0].text
>> >> > for l in root.findall('template-metadata'):
>> >> >     print l
>> >> >
>> >> >
>> >> > 392
>> >> > ALLFLEX-BLOOMINGTON
>> >> >
>> >> >
>> >> > It prints the length of the tree and the first element of the first
>> >> child,
>> >> > but when I try to loop through to find all the 'template-name's
>> >> > it does not print anything.
>> >> >
>> >> > What am I doing wrong?
>> >>
>> >> You have to include the namespace:
>> >>
>> >> for l in root.findall('{http://networks.com/nms}template-metadata'):
>> >>
>> >
>> > How do I extract the 'template-name' ?
>>
>> I hoped you'd get the idea.
>>
>> > This is what I tried
>> >
>> >  for l in root.findall('{http://networks.com/nms}template-metadata'):
>>
>> Rinse and repeat:
>>
>> >     print l.find('template-name').text
>>
>> should be
>>
>>     print l.find('{http://networks.com/nms}template-name').text
>>
>> >
>> > I am following the doc
>> > https://docs.python.org/2/library/xml.etree.elementtree.html section
>> > 19.7.1.3 findall example
>> >
>> > I get this error attribute error 'NoneType' object has no attribute
>> text.
>> > I do not understand why l.find('template-name') is NoneType.
>>
>> Take the time to read
>>
>> https://docs.python.org/2/library/xml.etree.elementtree.html
>> #parsing-xml-with-namespaces
>
>
> Thanks for the links and hints.
>
> I got it working now
>
> I used ns = { 'nms' : 'http://networks.com/nms
> <http://networks.com/nms%7Dtemplate-name').text>' }
>
> And then l.find('nms:template-name', ns)
>
> I also want to extract the namespace and I see this gets me the namespace
>
>       str(root[0]).split('{')[1].split('}')[0]
>
> Is there a better way to extract the name space?
>
>
>
This worked

ns = { 'nms' : root[0].tag.split('}')[0].split('{')[1] }

for l in root.findall('nms:template-metadata', ns):
    print l.find('nms:template-name', ns).text

Although I think manually creating the ns dictionary looks cleaner :-)

>
>>
>>
>> > Here is complete code with output.
>> >
>> >
>> > import xml.etree.ElementTree as ET
>> >
>> > xmlfile='''
>> > <collection xmlns:y="http://tail-f.com/ns/rest">
>> >   <template-metadata xmlns="http://networks.com/nms">
>> >     <template-name>ALLFLEX-BLOOMINGTON</template-name>
>> >     <type>post-staging</type>
>> >     <device-type>full-mesh</device-type>
>> >     <provider-tenant>ALLFLEX</provider-tenant>
>> >     <subscription xmlns="http://networks.com/nms">
>> >       <solution-tier>advanced-plus</solution-tier>
>> >       <bandwidth>1000</bandwidth>
>> >       <is-analytics-enabled>true</is-analytics-enabled>
>> >       <is-primary>true</is-primary>
>> >     </subscription></template-metadata></collection>'''
>> >
>> > root = ET.fromstring(xmlfile)
>> > print root.tag
>> > print root[0][0].text
>> > for l in root.findall('{http://networks.com/nms}template-metadata'):
>> >     print l.find('template-name').text
>> >
>> > collection
>> > ALLFLEX-BLOOMINGTON
>> >
>> >
>> ------------------------------------------------------------
>> ---------------
>> AttributeError
>> >                            Traceback (most recent call
>> > last)<ipython-input-18-73bd6770766a> in <module>()     19 print
>> > root[0][0].text     20 for l in
>> > root.findall('{http://networks.com/nms}template-metadata'):---> 21
>> > print l.find('template-name').text
>> > AttributeError: 'NoneType' object has no attribute 'text'
>>
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
> --
> Asif Iqbal
> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>
>

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?