XML Parsing

Wed Feb 25 00:43:47 EST 2009

On Tue, 24 Feb 2009 20:50:20 -0800, Girish wrote:

> Hello,
> 
> I have a xml file which is as follows:
> 
>     <pids>
>         <Parameter_Class>
>             <Parameter Id="pid_031605_093137_283">
>                 <Identifier>$0000</Identifier>
>                 <Type>PID</Type>
>                 <Signal><![CDATA[Parameter Identifiers Supported - $01
> to $20]]></Signal>
>                 <Description><![CDATA[This PID indicates which
> legislated PIDs]]></Description>
>                  ..............
>                  ...............
> 
> Can anyone please tell me how to get content of <Signal> tag.. that is,
> how to extract the data "![CDATA[Parameter Identifiers Supported - $01
> to $20]]"
> 
> Thanks,
> Girish...

The easy one is to use re module (Regular expression). 

# untested
import re
signal_pattern = re.compile('<Signal>(.*)</Signal>')
signals = signal_pattern.findall(xmlstring)

also, you may also use the xml module, which will be more reliable if you 
have data like this: <foo attr="<Signal>blooo</Signal>">blah</foo>,

>>> import xml.dom.minidom
>>> xmldata = xml.dom.minidom.parse(open('myfile.xml'))
>>> for node in xmldata.getElementsByTagName('Signal'): print node.toxml()
...