read all available pages on a Website

Leif K-Brooks eurleif at ecritters.biz
Mon Sep 13 10:09:30 CEST 2004


Tim Roberts wrote:
> Brad Tilley <bradtilley at usa.net> wrote:
> 
>>Is there a way to make urllib or urllib2 read all of the pages on a Web 
>>site?
> By the way, there are many web sites for which this sort of behavior is not
> welcome.

Any site that didn't want to be crawled would most likely use a 
robots.txt file, so you could check that before doing the crawl.



More information about the Python-list mailing list