[Chicago] BeautifulSoup gone bad

Kumar McMillan kumar.mcmillan at gmail.com
Thu Mar 12 16:31:51 CET 2009


On Thu, Mar 12, 2009 at 7:25 AM, Martin Maney <maney at two14.net> wrote:
>
> Version 3.1.0 is both slower and less robust than the previous release
> because it has changed to be usable in Python 3.0.
>
>  http://www.crummy.com/software/BeautifulSoup/3.1-problems.html

I have had a lot of success with lxml.html which is an alternative to
BeautifulSoup and is many orders of magnitude faster because it uses
libxml2:
http://codespeak.net/lxml/lxmlhtml.html

What's really nice is that you can use full xpath expressions on
crummy, poorly-formed HTML (the language of the Web!).  For a while
lxml was a bit unstable and hard to build on Mac but as of recent
versions I have not had any problems.

>
> --
> C makes an art of confusing pointers with arrays and strings, which
> leads to lotsa neat pointer tricks; APL mistakes everything for an array,
> leading to neat one-liners; and Perl confuses everything period,
> making each line a joyous adventure <wink>.  -- Tim Peters
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>


More information about the Chicago mailing list