Rate limiting a web crawler
scopensource at gmail.com
Wed Dec 26 10:35:12 EST 2018
I want to build a simple web crawler. I know how I am going to do it but
I have one problem.
Obviously I don't want to negatively impact any of the websites that I
am crawling so I want to implement some form of rate limiting of HTTP
requests to specific domain names.
What I'd like is some form of timer which calls a piece of code say
every 5 seconds or something and that code is what goes off and crawls
I'm just not sure on the best way to call code based on a timer.
Could anyone offer some advice on the best way to do this? It will be
running on Linux and using the python-daemon library to run it as a
service and will be using at least Python 3.6.
Thanks for any help.
More information about the Python-list