Re: [lxml-dev] Question about etree vs html

1 Sep 2010

      On Mon, 2010-08-30 at 17:34 +0300, Dimitrios Pritsos wrote:
...
I am Dimitrios Pritsos and I am working on a WebCrawler. In order to
analyse the pages that I am getting while crawling I am using lxml.
However I cannot tell the difference of lxml.html and lxml.etree when
coming to the XHTML parsing. In particular I am confused of what to
use from the variety of options lxml is providing.
Hi, I think lxml.html and lxml.etree do the same, but html have some
methods specific to html like:
.head 

and html just have tostring which is etree.HTMLparser() while etree have
more parsers. 

I'm developing a kind a WebCrawler too, but problems of parsing bad
html, falls in libxml2, not here. lxml is just a wrapper of libxml2 and
libxslt ( which are coded in C or C++ ) for python .

Cheers,
-- 
Sérgio M. B.

Re: [lxml-dev] Question about etree vs html

Sergio Monteiro Basto