read all available pages on a Website

Tim Roberts timr at
Mon Sep 13 08:13:49 CEST 2004

Brad Tilley <bradtilley at> wrote:

>Is there a way to make urllib or urllib2 read all of the pages on a Web 
>site? For example, say I wanted to read each page of into 
>separate strings (a string for each page). The problem is that I don't 
>know how many pages are at How can I handle this?

You have to parse the HTML to pull out all the links and images and fetch
them, one by one.  sgmllib can help with the parsing.  You can multithread
this, if performance in an issue.

By the way, there are many web sites for which this sort of behavior is not
- Tim Roberts, timr at
  Providenza & Boekelheide, Inc.

More information about the Python-list mailing list