[BangPypers] HTML Parsing in python

Puneet Aggarwal look4puneet at gmail.com
Thu Sep 10 16:16:47 CEST 2009


Hi Dhananjay,

My requirement is simple. I need to extract information from a page. But the
pages can be malformed html or it can be any junk html. So the tolerance
required.

Thanks,
Puneet


On Thu, Sep 10, 2009 at 7:33 PM, Dhananjay Nene <dhananjay.nene at gmail.com>wrote:

> Do you require tolerance for non well formed xml / html ? If y, you may
> consider sgmlop http://effbot.org/zone/sgmlop-index.htm
>
>
> On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose <b.ghose at gmail.com>wrote:
>
>> > Can anyone suggest me a good library for html parsing in python ?
>> > I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser
>> etc.
>> >
>> > Can anyone suggest me which should I go for from your experience.
>>
>> BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
>>
>> http://codespeak.net/lxml/
>>
>> Regards,
>> BG
>>
>>
>> --
>> Baishampayan Ghose
>> b.ghose at gmail.com
>> _______________________________________________
>> BangPypers mailing list
>> BangPypers at python.org
>> http://mail.python.org/mailman/listinfo/bangpypers
>>
>
>
>
> --
> --------------------------------------------------------
> blog: http://blog.dhananjaynene.com
> twitter: http://twitter.com/dnene
>
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/bangpypers/attachments/20090910/ec3281f6/attachment.htm>


More information about the BangPypers mailing list