[Tutor] how to extract text by specifying an element using ElementTree
ps python
ps_python3 at yahoo.co.in
Tue Dec 20 19:33:17 CET 2005
Thank you for your email Dr. Johnson.
I need to print :
gene_symbol (from line
<gene_symbol>ALDH3A1</gene_symbol>)
entry_cdna (from line
<entry_cdna>NM_000691.3</entry_cdna>)
molecular_class
(from line
<molecular_class>Enzyme:Dehydrogenase</molecular_class>)
title (from tags <molecular_function><title>Catalytic
activity</title>)
title (from tags section <biological_processes>
<biological_process><title>Metabolism</title>)
title (from tags section
<cellular_component><primary><title>cytoplasm</title>)
This is how I tried:
from elementtree.ElementTree import ElementTree
mydata = ElementTree(file='00004.xml')
>>> for process in
mydata.findall('//biological_process'):
print process.get('title').text
>>> for m in mydata.findall('//functions'):
print m.get('molecular_class').text
>>> for m in mydata.findall('//functions'):
print m.find('molecular_class').text.strip()
>>> for process in
mydata.findall('//biological_process'):
print process.get('title').text
>>> for m in mydata.findall('//functions'):
print m.get('molecular_class').text
>>> for m in mydata.findall('//functions'):
print m.get('title').text.strip()
>>> for m in mydata.findall('//biological_processes'):
print m.get('title').text.strip()
>>>
Result:
I get nothing. No error. I have no clue why it is
not giving me the result.
I also tried this alternate way:
>>> strdata = """<functions>
<molecular_class>Enzyme:
Dehydrogenase</molecular_class>
<molecular_function>
<title>Catalytic activity</title>
<goid>0003824</goid>
</molecular_function>
<biological_processes>
<biological_process>
<title>Metabolism</title>
<goid>0008152</goid>
</biological_process>
<biological_process>
<title>Energy pathways</title>
<goid>0006091</goid>
</biological_process>
</biological_processes>
</functions>"""
>>> from elementtree import ElementTree
>>> tree = ElementTree.fromstring(strdata)
>>> for m in tree.findall('//functions'):
print m.find('molecular_class').text
Traceback (most recent call last):
File "<pyshell#18>", line 1, in -toplevel-
for m in tree.findall('//functions'):
File
"C:\Python23\Lib\site-packages\elementtree\ElementTree.py",
line 352, in findall
return ElementPath.findall(self, path)
File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 195, in findall
return _compile(path).findall(element)
File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 173, in _compile
p = Path(path)
File
"C:\Python23\Lib\site-packages\elementtree\ElementPath.py",
line 74, in __init__
raise SyntaxError("cannot use absolute path on
element")
SyntaxError: cannot use absolute path on element
>>> for m in tree.findall('functions'):
print m.find('molecular_class').text
>>> for m in tree.findall('functions'):
print m.find('molecular_class').text.strip()
>>> for m in tree.findall('functions'):
print m.get('molecular_class').text
Do you thing it is a problem with the XML files
instead.
Thank you for valuable suggestions.
kind regards,
M
--- Kent Johnson <kent37 at tds.net> wrote:
> ps python wrote:
> > Dear Drs. Johnson and Yoo ,
> > for the last 1 week I have been working on
> parsing
> > the elements from a bunch of XML files following
> your
> > suggestions.
> >
> > from elementtree.ElementTree import ElementTree
> >
> >>>>mydata = ElementTree(file='00004.xml')
> >>>>for process in
> >
> > mydata.findall('//biological_process'):
> > print process.text
>
> Looking at the data, neither <biological_process>
> nor <functions> elements directly
> contain text, they have children that contain text.
> Try
> print process.get('title').text
> to print the title.
>
> >>>>for proc in mydata.findall('functions'):
> > print proc
>
> I think you want findall('//functions') to find
> <functions> at any depth in the tree.
>
> If this doesn't work please show the results you get
> and tell us what you expect.
>
> Kent
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
Send instant messages to your online friends http://in.messenger.yahoo.com
More information about the Tutor
mailing list