[Baypiggies] HTML Parsers (n00b)
jeff.enderwick at gmail.com
Sun Jan 31 23:58:48 CET 2010
I've used beautiful soup to programmatically extract content from word docs
saved as HTML - yuck!!!
Beautiful Soup performed ... beautifully :-). Speed was NOT a consideration
for me, though.
On Thu, Jan 28, 2010 at 5:26 PM, Jeff Kunce <jjkunce at gmail.com> wrote:
> No question about lxml's speed. I'm using it (as part of Deliverance) on a
> current project to re-theme a website on the fly.
> But for day-to-day use, it's Beautiful Soup. I can't resist pure python :)
> -- Jeff
> On Thu, Jan 28, 2010 at 4:58 PM, Alec Flett <alecf at flett.org> wrote:
>> lxml is awesome, don't be fooled by the name - it has great understanding
>> of HTML, even malformed.
>> ianbicking did a great comparison years ago but it still stands:
>> and an update:
>> Basically: lxml is fast as hell, (uses libxml2 under the hood)low memory
>> footprint, and very forgiving of wacky html, better than Beautiful Soup.
>> I think pyquery actually uses lxml under the hood? or at least libxml2?
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Baypiggies