[Tutor] can I walk or glob a website?

Wed May 18 14:40:25 CEST 2011

Albert-Jan Roskam wrote:

> How can I walk (as in os.walk) or glob a website? I want to download all
> the pdfs from a website (using urllib.urlretrieve), extract certain
> figures (using pypdf- is this flexible enough?) and make some
> statistics/graphs from those figures (using rpy and R). I forgot what the
> process of 'automatically downloading' is called again, something that
> sounds like 'whacking' (??)

If you've downloaded a source distribution of python you should have this 
little sucker on your harddisk:

http://hg.python.org/cpython/file/31cd146d725c/Tools/webchecker/websucker.py