[Tutor] identifying and parsing string in text file

Sat Mar 8 22:35:28 CET 2008

Bryan Fodness wrote:
> I have a large file that has many lines like this,
>  
> <element tag="300a,0014" vr="CS" vm="1" len="4" 
> name="DoseReferenceStructureType">SITE</element>
> I would like to identify the line by the tag (300a,0014) and then grab 
> the name (DoseReferenceStructureType) and value (SITE).
>  
> I would like to create a file that would have the structure,
>  
>      DoseReferenceStructureType = Site

Presuming that your source file is XML, I think I would use 
ElementTree.iterparse() to process this.
http://effbot.org/zone/element-iterparse.htm
http://docs.python.org/lib/elementtree-functions.html

Something like this (untested):

from xml.etree.ElementTree import iterparse

source = open('mydata.xml').read()
out = open('myoutput.txt', 'w')

for event, elem in iterparse(source):
     if elem.tag == "element":
         name = elem['name']
	text = elem.text
	out.write('%s = %s\n' % (name, text)
         elem.clear()

out.close()