history xml file parse help

Jonathan Hogg jonathan at onegoodidea.com
Wed Jul 17 03:13:19 EDT 2002


On 17/7/2002 7:47, in article ah33go$b6i$1 at newsreader.mailgate.org,
"a64bs4$1oo$1 at newsreader.mailgate.org" <eugene1977 at hotmail.com> wrote:

> i'm totally newbie to python/xml
> (just reading first book of those)
> 
> i'd like to parse a history file(galeon browser)
> and feed into database
> and crawl each url and save it in database also..
> so that i can do search on it..
> 
> well..
> this history file looks very easy to parse..
> but i couldn't find a working sample code..
> 
> can anyone help me plz?
> thank you

Try:

>>> from xml.dom.minidom import parse
>>> from xml.xpath import Compile, CreateContext
>>> 
>>> doc = parse( 'history.xml' )
>>> query = Compile( '//item' )
>>> 
>>> for item in query.evaluate( CreateContext(doc) ):
...     title = item.getAttribute( 'title' )
...     url = item.getAttribute( 'url' )
...     print title
... 

This assumes that you have the PyXML package installed:

    <http://pyxml.sf.net/>

You don't need to use the XPath query since the document structure is so
simple, but it makes the code much shorter than the equivalent decomposition
of the DOM nodes. You can also pull neat tricks like:

    //item[@visits > 1]

to get just the URLs you've visited more than once.

Jonathan




More information about the Python-list mailing list