[Baypiggies] HTML Parsers (n00b)

Fri Jan 29 02:26:08 CET 2010

No question about lxml's speed.  I'm using it (as part of Deliverance) on a
current project to re-theme a website on the fly.

But for day-to-day use, it's Beautiful Soup.  I can't resist pure python :)

  -- Jeff

On Thu, Jan 28, 2010 at 4:58 PM, Alec Flett <alecf at flett.org> wrote:

> lxml is awesome, don't be fooled by the name - it has great understanding
> of HTML, even malformed.
>
> ianbicking did a great comparison years ago but it still stands:
> http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/
>
> and an update:
>
> http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/
>
> Basically: lxml is fast as hell, (uses libxml2 under the hood)low memory
> footprint, and very forgiving of wacky html, better than Beautiful Soup.
>
> I think pyquery actually uses lxml under the hood? or at least libxml2?
>
>
> Alec
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100128/ab24fbc7/attachment.htm>