[Baypiggies] HTML Parsers (n00b)
alecf at flett.org
Fri Jan 29 01:58:13 CET 2010
lxml is awesome, don't be fooled by the name - it has great understanding of
HTML, even malformed.
ianbicking did a great comparison years ago but it still stands:
and an update:
Basically: lxml is fast as hell, (uses libxml2 under the hood)low memory
footprint, and very forgiving of wacky html, better than Beautiful Soup.
I think pyquery actually uses lxml under the hood? or at least libxml2?
On Thu, Jan 28, 2010 at 3:43 PM, Max Slimmer <max at theslimmers.net> wrote:
> I like lxml
> On Thu, Jan 28, 2010 at 3:23 PM, Kimball Bighorse <kbighorse at yahoo.com>wrote:
>> Looking at beautiful soup, html5lib and pyquery, anything else I should be
>> aware of?
>> Many thanks,
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Baypiggies