python xml DOM? pulldom? SAX?
jog
jo at johannageiss.de
Mon Aug 29 11:17:04 EDT 2005
Hi,
I want to get text out of some nodes of a huge xml file (1,5 GB). The
architecture of the xml file is something like this
<parent>
<page>
<title>bla</title>
<id></id>
<revision>
<id></id>
<text>blablabla</text>
<revision>
</page>
<page>
</page>
....
</parent>
I want to combine the text out of page:title and page:revision:text for
every single page element. One by one I want to index these combined
texts (so for each page one index)
What is the most efficient API for that?: SAX ( I donĀ“t thonk so) DOM
or pulldom?
Or should I just use Xpath somehow.
I don`t want to do anything else with his xml file afterwards.
I hope someone will understand me.....
Thank you very much
Jog
More information about the Python-list
mailing list