benash at gmail.com
Sat Apr 26 23:26:04 CEST 2008
On Apr 6, 11:03 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> Benjamin wrote:
> > I'm trying to parse an HTML file. I want to retrieve all of the text
> > inside a certain tag that I find with XPath. The DOM seems to make
> > this available with the innerHTML element, but I haven't found a way
> > to do it in Python.
> import lxml.html as h
> tree = h.parse("somefile.html")
> text = tree.xpath("string( some/element[@condition] )")
I actually had trouble getting this to work. I guess only new version
of lxml have the html module, and I couldn't get it installed. lxml
does look pretty cool, though.
More information about the Python-list