[Tutor] how to extract text by specifying an element using ElementTree
ps_python3 at yahoo.co.in
Tue Dec 20 18:26:39 CET 2005
Dear Drs. Johnson and Yoo ,
for the last 1 week I have been working on parsing
the elements from a bunch of XML files following your
until now I have been unsuccessul. I have no clue why
i am failing.
I have ~16K XML files. this data obtained from johns
hopkins university (of course these are public data
and is allowed to use for teaching and non-commercial
from elementtree.ElementTree import ElementTree
>>> mydata = ElementTree(file='00004.xml')
>>> for process in
>>> for proc in mydata.findall('functions'):
I do not understand why I am unable to parse this
file. I questioned if this file is not well structures
(well formedness). I feel it is properly structured
and yet it us unparsable.
Would you please help me /guide me what the problem
is. Apologies if i am completely ignoring somethings.
PS: Attached is the XML file that I am using.
--- Kent Johnson <kent37 at tds.net> wrote:
> ps python wrote:
> > Kent and Dany,
> > Thanks for your replies.
> > Here fromstring() assuming that the input is in a
> > of text format.
> Right, that is for the sake of a simple example.
> > what should be the case when I am reading files
> > directly.
> > I am using the following :
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='00001.xml')
> > iter = root.getiterator()
> > Here the whole XML document is loaded as element
> > and how should this iter into a format where I can
> > apply findall() method.
> Call findall() directly on mydata, e.g.
> for process in
> print process.text
> The path //biological_process means find any
> biological_process element
> at any depth from the root element.
> Tutor maillist - Tutor at python.org
Send instant messages to your online friends http://in.messenger.yahoo.com
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 10855 bytes
Url : http://mail.python.org/pipermail/tutor/attachments/20051220/cfdc0134/00004.bin
More information about the Tutor