Any equivalent to Ruby's 'hpricot' html/xpath/css selector package?
stefan_ml at behnel.de
Tue Dec 30 14:26:37 CET 2008
Bruno Desthuilliers wrote:
>> However, what makes it really useful is that it does a good job of
>> handling the "broken" html that is so commonly found on the web.
> BeautifulSoup ?
> possibly with ElementSoup ?
It's actually debatable if BS is any better than lxml/libxml2 when parsing
broken HTML, as lxml tends to tidy things up pretty well. The only major
difference is in encoding detection, for which you can also use a separate
tool like chardet:
More information about the Python-list