[Tutor] Is a link broken?

Ed Owens eowens0124 at gmx.com
Sat Jan 12 03:47:37 CET 2013


I'm still working through Chun's "Core Python Applications".  I got the 
web crawler (Example 9-2) working after I found a ':' typing error.  Now 
I'm trying to convert that to a program that checks for broken links.  
This is not in the book.  The problem I'm having now is knowing whether 
a link is working.

I've written an example that I hope illustrates my problem:

#!/usr/bin/env python

import urllib2

sites = ('http://www.catb.org', 'http://ons-sa.org', 'www.notasite.org')
for site in sites:
     try:
         page = urllib2.urlopen(site)
         print page.geturl(), "didn't return error on open"
         print 'Reported server is', page.info()['Server']
     except:
         print site, 'generated an error on open'
     try:
         page.close()
         print site, 'successfully closed'
     except:
         print site, 'generated error on close'


Site 1 is alive, the other two dead.  Yet this code only returns an 
error on site three.  Notice that I checked for a redirection (I think) 
of the site if it opened, and that didn't help with site two.

Is there an unambiguous way to determine if a link has died -- knowing 
nothing about the link in advance?

Ed





More information about the Tutor mailing list