[BangPypers] HTML Parsing in python

Yuvi Panda yuvipanda at gmail.com
Tue Oct 20 15:04:55 CEST 2009


I use lxml.html. Just as good, and MUCH faster. A pain to install though.

On Tue, Oct 20, 2009 at 6:32 PM, Anand Balachandran Pillai <
abpillai at gmail.com> wrote:

>
>
> On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal <look4puneet at gmail.com>wrote:
>
>> Thanks all for the suggestions. I think I will start with BeautifulSoup
>> (3.0.7a) and will experiment with other suggested libs if it does not fit
>> into my requirement or if I face issues with this.
>>
>
>  You are not going to believe this, but the creator of BeautifulSoup
> (Leonardo)
>  advised me to use the SGMLParser module in Python for parsing HTML.  This
>  was back in 2004 (or 2005) when I had written to him regarding
> BeautifulSoup
>  as parser in HarvestMan. He advised me to derive a wrapper from SGMLParser
>  and thats what I did.
>
>  In case you are interested, you can check out the HTML parser used in
> HarvestMan.
> It is available at,
>
>
> http://harvestman-crawler.googlecode.com/svn/trunk/HarvestMan/harvestman/lib/pageparser.py
>
>
>
>>
>> On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose <b.ghose at gmail.com>wrote:
>>
>>> > Can anyone suggest me a good library for html parsing in python ?
>>> > I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser
>>> etc.
>>> >
>>> > Can anyone suggest me which should I go for from your experience.
>>>
>>> BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
>>>
>>> http://codespeak.net/lxml/
>>>
>>> Regards,
>>> BG
>>>
>>>
>>> --
>>> Baishampayan Ghose
>>> b.ghose at gmail.com
>>> _______________________________________________
>>> BangPypers mailing list
>>> BangPypers at python.org
>>> http://mail.python.org/mailman/listinfo/bangpypers
>>>
>>
>>
>> _______________________________________________
>> BangPypers mailing list
>> BangPypers at python.org
>> http://mail.python.org/mailman/listinfo/bangpypers
>>
>>
>
>
> --
> --Anand
>
>
>
>
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
>


-- 
Yuvi Panda T
http://yuvisense.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/bangpypers/attachments/20091020/9377c075/attachment.htm>


More information about the BangPypers mailing list