[Tutor] Web crawling!

Arun Tomar tomar.arun at gmail.com
Wed Jul 29 20:43:46 CEST 2009


Hi!
Raj.

On Wed, Jul 29, 2009 at 9:29 PM, Raj Medhekar <cosmicsand27 at yahoo.com>wrote:

> Does anyone know a good webcrawler that could be used in tandem with the
> Beautiful soup parser to parse out specific elements from news sites like
> BBC and CNN? Thanks!
> -Raj
>

As i didn't find any good webcrawler as per my clients need, so i wrote one
for them, but it's specific for their need only. i can't disclose any more
details about it.

In short, i'm using my app to crawl the specific sites, then parse it with
beautiful soup and extract all the links on that page, then visit the links
and search for the keywords on those pages. If the keyword is occurs more
than the specified limit then it's a useful link and store it in the
database or else leave it.


-- 
Regards,
Arun Tomar
blog: http://linuxguy.in
website: http://www.solutionenterprises.co.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090730/f56e3556/attachment.htm>


More information about the Tutor mailing list