[Tutor] python internet archive API?

Switanek, Nick nswitanek at stanford.edu
Thu Apr 26 03:36:18 CEST 2007


I'm a novice Python programmer, and I've been looking for a way to
collect archived web pages. I would like to use the data on Internet
Archive, via the "Wayback Machine". Look, for example, at
http://web.archive.org/web/*/http://www.python.org
<http://web.archive.org/web/*/http:/www.python.org> . I'd like to crawl
down the first few levels of links of each of the updated archived pages
(the ones with *'s next to them). The site's robots.txt exclusions are
complete, so a screen-scraping strategy doesn't seem doable. 

 

Does anyone have any suggestions for a way to go about this
pythonically? 

 

Many thanks,

Nick

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070425/090df5d8/attachment.htm 


More information about the Tutor mailing list