Repeatedly crawl website every 1 min
Iuri
iurisilvio at gmail.com
Thu May 11 05:27:21 EDT 2017
Unless you are authorized, don't do it. It literally costs a lot of money
to the website you are crawling, in CPU and bandwidth.
Hundreds of concurrent requests can even kill a small server (with bad
configuration).
Look scrapy package, it is great for scraping, but be friendly with the
websites you are crawling.
Em 10 de mai de 2017 23:22, <liyucun2012 at gmail.com> escreveu:
> Hi Everyone,
>
> Thanks for stoping by. I am working on a feature to crawl website content
> every 1 min. I am curious to know if there any good open source project for
> this specific scenario.
>
> Specifically, I have many urls, and I want to maintain a thread pool so
> that each thread will repeatedly crawl content from the given url. It could
> be a hundreds thread at the same time.
>
> Your help is greatly appreciated.
>
> ;)
> --
> https://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list