[Tutor] cdata/aml question..

Sun Apr 13 10:56:29 CEST 2014

bruce wrote:

> The following text contains sample data. I'm simply trying to parse it
> using libxml2dom as the lib to extract data.
> 
> As an example, to get the name/desc
> 
> test data
> <class_meta_data><departments><department><name><![CDATA[A
> HTG]]></name><desc><![CDATA[American
> Heritage]]></desc></department><department><name><!
[CDATA[ACC]]></name><desc><![CDATA[Accounting]]></desc></department>
> 
>     d = libxml2dom.parseString(s, html=1)
> 
>     p1="//department/name"
>     p2="//department/desc"
> 
>     pcount_ = d.xpath(p1)
>     p2_ = d.xpath(p2)
>     print str(len(pcount_))
>     nba=0
> 
>     for a in pcount_:
>       abbrv=a.nodeValue
>       print abbrv
>       abbrv=a.toString()
>       print abbrv
>       abbrv=a.textContent
>       print abbrv
> 
> neither of the above generates any of the CML name/desc data..
> 
> any pointers on what I'm missing???

Your example seems to work here when I omit the html=1 

    d = libxml2dom.parseString(s)
    ...

> I can/have created a quick parse/split process to get the data, but I
> thought there'd be a straight forward process to extract the data
> using one of the py/libs..

One way using the stdlib:

from xml.etree import ElementTree as ET
#root = ET.parse(filename).getroot()
root = ET.fromstring(data)
for department in root.findall(".//department"):
    name = department.find("name").text
    desc = department.find("desc").text
    print("{}: {}".format(name, desc))