read all available pages on a Website
fuzzyman at gmail.com
Mon Sep 13 15:31:58 CEST 2004
Brad Tilley <bradtilley at usa.net> wrote in message news:<ci2qnl$2jq$1 at solaris.cc.vt.edu>...
> Is there a way to make urllib or urllib2 read all of the pages on a Web
> site? For example, say I wanted to read each page of www.python.org into
> separate strings (a string for each page). The problem is that I don't
> know how many pages are at www.python.org. How can I handle this?
I can highly reccommend the BeautifulSoup parser for helping you to
extract all the links - should make it a doddle. (you want to check
that you only follwo links that are in www.python.org of course - the
standard library urlparse should help with that).
More information about the Python-list