[BangPypers] HTML Parsing in python

Ramkumar R artagnon at gmail.com
Thu Sep 10 14:49:13 CEST 2009


> +1 Beautiful Soup

The author is no longer interested in maintaining BeautifulSoup (see
http://www.crummy.com/software/BeautifulSoup/3.1-problems.html). The
BeautifulSoup port to Python 3.x is pretty terrible, as it's based on
the error intolerant HTMLParser. While it's a fantastic library for
beginners (I recall that my first Python program was a BeautifulSoup
screen scraper), I wouldn't recommend that you use it on production.
You might want to check out html5lib which is still at its infancy
stages, or use cElementTree (the ElementTree implementation in C).


More information about the BangPypers mailing list