read all available pages on a Website

Michael Foord fuzzyman at
Mon Sep 13 15:31:58 CEST 2004

Brad Tilley <bradtilley at> wrote in message news:<ci2qnl$2jq$1 at>...
> Is there a way to make urllib or urllib2 read all of the pages on a Web 
> site? For example, say I wanted to read each page of into 
> separate strings (a string for each page). The problem is that I don't 
> know how many pages are at How can I handle this?
> Thanks,
> Brad

I can highly reccommend the BeautifulSoup parser for helping you to
extract all the links - should make it a doddle. (you want to check
that you only follwo links that are in of course - the
standard library urlparse should help with that).



More information about the Python-list mailing list