creat a DOM from an html document
Paul Boddie
paul at boddie.org.uk
Thu Feb 9 16:50:55 EST 2006
John J. Lee wrote:
> Mark Harrison <mh at pixar.com> writes:
>
> > Ahh, it's BeautifulSoup...
>
> Strictly that's not THE DOM, just A document object model. The DOM
> proper is a standardised interface, which BeautifulSoup does not
> implement. You could build a DOM using BeautifulSoup, though.
For a certain value of standardised, libxml2dom provides "the DOM" for
HTML:
import urllib, libxml2dom
f = urllib.urlopen("http://www.python.org")
s = f.read(); f.close()
d = libxml2dom.parseString(s, html=1)
print "There are", len(d.xpath("//table")), "tables in the document."
See http://www.python.org/pypi/libxml2dom for more information.
Paul
More information about the Python-list
mailing list