Any equivalent to Ruby's 'hpricot' html/xpath/css selector package?

Stefan Behnel stefan_ml at
Tue Dec 30 14:26:37 CET 2008

Bruno Desthuilliers wrote:
>> However, what makes it really useful is that it does a good job of
>> handling the "broken" html that is so commonly found on the web.
> BeautifulSoup ?
> possibly with ElementSoup ?

It's actually debatable if BS is any better than lxml/libxml2 when parsing
broken HTML, as lxml tends to tidy things up pretty well. The only major
difference is in encoding detection, for which you can also use a separate
tool like chardet:


