[Tutor] how to extract text by specifying an element using ElementTree

ps python ps_python3 at yahoo.co.in
Tue Dec 20 19:33:17 CET 2005


Thank you for your email Dr. Johnson. 


I need to print :

gene_symbol (from line
<gene_symbol>ALDH3A1</gene_symbol>)
entry_cdna  (from line
<entry_cdna>NM_000691.3</entry_cdna>)
molecular_class 
(from line
<molecular_class>Enzyme:Dehydrogenase</molecular_class>)


title (from tags <molecular_function><title>Catalytic
activity</title>)

title (from tags section <biological_processes>
<biological_process><title>Metabolism</title>)

title (from tags section 
<cellular_component><primary><title>cytoplasm</title>)


This is how I tried:

from elementtree.ElementTree import ElementTree
mydata = ElementTree(file='00004.xml')
>>> for process in
mydata.findall('//biological_process'):
	print process.get('title').text

	
>>> for m in mydata.findall('//functions'):
	print m.get('molecular_class').text

	
>>> for m in mydata.findall('//functions'):
	print m.find('molecular_class').text.strip()


>>> for process in
mydata.findall('//biological_process'):
	print process.get('title').text

	
>>> for m in mydata.findall('//functions'):
	print m.get('molecular_class').text

	
>>> for m in mydata.findall('//functions'):
	print m.get('title').text.strip()

	
>>> for m in mydata.findall('//biological_processes'):
	  print m.get('title').text.strip()

	  
>>> 


Result:
I get nothing.  No error.  I have no clue why it is
not giving me the result. 

I also tried this alternate way:

>>> strdata = """<functions>
      <molecular_class>Enzyme:
Dehydrogenase</molecular_class>
      
         <molecular_function>
            <title>Catalytic activity</title>
            <goid>0003824</goid>
         </molecular_function>
      
      <biological_processes>
         <biological_process>
            <title>Metabolism</title>
            <goid>0008152</goid>
         </biological_process>
         <biological_process>
            <title>Energy pathways</title>
            <goid>0006091</goid>
         </biological_process>
      </biological_processes>
    </functions>"""

>>> from elementtree import ElementTree
>>> tree = ElementTree.fromstring(strdata)
>>> for m in tree.findall('//functions'):
	print m.find('molecular_class').text

	

Traceback (most recent call last):
  File "<pyshell#18>", line 1, in -toplevel-
    for m in tree.findall('//functions'):
  File
"C:\Python23\Lib\site-packages\elementtree\ElementTree.py",
line 352, in findall
    return ElementPath.findall(self, path)
  File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 195, in findall
    return _compile(path).findall(element)
  File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 173, in _compile
    p = Path(path)
  File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 74, in __init__
    raise SyntaxError("cannot use absolute path on
element")
SyntaxError: cannot use absolute path on element
>>> for m in tree.findall('functions'):
	print m.find('molecular_class').text

	
>>> for m in tree.findall('functions'):
	print m.find('molecular_class').text.strip()

	
>>> for m in tree.findall('functions'):
	print m.get('molecular_class').text

	


Do you thing it is a problem with the XML files
instead. 

Thank you for valuable suggestions. 

kind regards, 
M




--- Kent Johnson <kent37 at tds.net> wrote:

> ps python wrote:
> > Dear Drs. Johnson and Yoo , 
> >  for the last 1 week I have been working on
> parsing
> > the elements from a bunch of XML files following
> your
> > suggestions. 
> > 
> > from elementtree.ElementTree import ElementTree
> > 
> >>>>mydata = ElementTree(file='00004.xml')
> >>>>for process in
> > 
> > mydata.findall('//biological_process'):
> > 	print process.text
> 
> Looking at the data, neither <biological_process>
> nor <functions> elements directly
> contain text, they have children that contain text.
> Try
>    print process.get('title').text
> to print the title.
> 
> >>>>for proc in mydata.findall('functions'):
> > 	print proc
> 
> I think you want findall('//functions') to find
> <functions> at any depth in the tree.
> 
> If this doesn't work please show the results you get
> and tell us what you expect.
> 
> Kent
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


Send instant messages to your online friends http://in.messenger.yahoo.com 


More information about the Tutor mailing list