[BangPypers] Website change tracker

vid vid at svaksha.com
Fri Jun 8 19:06:00 CEST 2012


On Fri, Jun 8, 2012 at 4:09 PM, kracethekingmaker
<kracethekingmaker at gmail.com> wrote:
>
>> Hello,
>>
>> I am newbie to Python coding. And, I had a question. I want to write a
>> script which will check content changes in websites&  send e-mail to a
>>
>> admin whenever there are changes.
>
> How many times in a day or how often will this check be performed ?
>
> You must look into how to use md5, diff utilities, for web scraping scrapy
> library is advised.
>
>> Ideally this script/program should be scalable for say about 1000 websites
>> at a time..

1000 sites at a time? Wow, that's huge. Scraping that many sites is
resource intensive, would need a nice big stable server that can
handle the huge data dumps. Fwiw, Scrapy will only dump the data in
the json files so check out a little about the database you want to
use, the frontend to serve it, a queueing system to scale 1000 sites,
etc... Also, some sites instantly ban scrapers. Watch out for that,
and goodluck :)

-- 
Regards,
Vid
॥ http://svaksha.com


More information about the BangPypers mailing list