[Tutor] python internet archive API?
Switanek, Nick
nswitanek at stanford.edu
Thu Apr 26 03:36:18 CEST 2007
I'm a novice Python programmer, and I've been looking for a way to
collect archived web pages. I would like to use the data on Internet
Archive, via the "Wayback Machine". Look, for example, at
http://web.archive.org/web/*/http://www.python.org
<http://web.archive.org/web/*/http:/www.python.org> . I'd like to crawl
down the first few levels of links of each of the updated archived pages
(the ones with *'s next to them). The site's robots.txt exclusions are
complete, so a screen-scraping strategy doesn't seem doable.
Does anyone have any suggestions for a way to go about this
pythonically?
Many thanks,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070425/090df5d8/attachment.htm
More information about the Tutor
mailing list