[BangPypers] HTML Parsing in python
Anand Balachandran Pillai
abpillai at gmail.com
Tue Oct 20 15:02:58 CEST 2009
On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal <look4puneet at gmail.com>wrote:
> Thanks all for the suggestions. I think I will start with BeautifulSoup
> (3.0.7a) and will experiment with other suggested libs if it does not fit
> into my requirement or if I face issues with this.
You are not going to believe this, but the creator of BeautifulSoup
advised me to use the SGMLParser module in Python for parsing HTML. This
was back in 2004 (or 2005) when I had written to him regarding
as parser in HarvestMan. He advised me to derive a wrapper from SGMLParser
and thats what I did.
In case you are interested, you can check out the HTML parser used in
It is available at,
> On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose <b.ghose at gmail.com>wrote:
>> > Can anyone suggest me a good library for html parsing in python ?
>> > I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser
>> > Can anyone suggest me which should I go for from your experience.
>> BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
>> Baishampayan Ghose
>> b.ghose at gmail.com
>> BangPypers mailing list
>> BangPypers at python.org
> BangPypers mailing list
> BangPypers at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the BangPypers