[BangPypers] Website change tracker

Sriram Narayanan sriramnrn at gmail.com
Sat Jun 9 03:43:42 CEST 2012


If you need to check for the absence of certain content, then write tests
using Sahi or Selenium, and run those at periodic intervals.

Ram
On Jun 8, 2012 11:21 PM, "Bhavya" <bhavya.mayur at gmail.com> wrote:

> Thanks everyone:)...Much appreciated.
> I will work on it & let the group know how it goes.
>
> Thanks,
> Bhavya
>
> On Fri, Jun 8, 2012 at 1:06 PM, vid <vid at svaksha.com> wrote:
>
> > On Fri, Jun 8, 2012 at 4:09 PM, kracethekingmaker
> > <kracethekingmaker at gmail.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> I am newbie to Python coding. And, I had a question. I want to write a
> > >> script which will check content changes in websites&  send e-mail to a
> > >>
> > >> admin whenever there are changes.
> > >
> > > How many times in a day or how often will this check be performed ?
> > >
> > > You must look into how to use md5, diff utilities, for web scraping
> > scrapy
> > > library is advised.
> > >
> > >> Ideally this script/program should be scalable for say about 1000
> > websites
> > >> at a time..
> >
> > 1000 sites at a time? Wow, that's huge. Scraping that many sites is
> > resource intensive, would need a nice big stable server that can
> > handle the huge data dumps. Fwiw, Scrapy will only dump the data in
> > the json files so check out a little about the database you want to
> > use, the frontend to serve it, a queueing system to scale 1000 sites,
> > etc... Also, some sites instantly ban scrapers. Watch out for that,
> > and goodluck :)
> >
> > --
> > Regards,
> > Vid
> > ॥ http://svaksha.com> > _______________________________________________
> > BangPypers mailing list
> > BangPypers at python.org
> > http://mail.python.org/mailman/listinfo/bangpypers
> >
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>


More information about the BangPypers mailing list