[Tutor] open a webpage which may be unavailable

Kent Johnson kent37 at tds.net
Thu Oct 18 13:35:19 CEST 2007


pileux systeme wrote:
> Hello,
>  
> I am trying to retrieve data from several webpages. My problem is the 
> following: after a random number of requests, the page I'm trying to 
> open is unavailable (and I get an IOError). Note that the page may 
> become available if I try again after some time. Since I have thousands 
> pages to explore, I'd like to be able to continue the program in spite 
> of this error.
> I've thought of trying to raise an exception such as:
> try:
>   usock = urllib.urlopen('http:// <etc>')
> except:
> < something >
> else:
>   usock = urllib.urlopen('http:// <etc>')

I'm confused about the code above. First, it is catching an exception, 
not raising it. Second, the 'else' clause, which will run if there is no 
exception, seems to do the same thing as the 'try' clause; I don't 
understand you intent.

> However, this doesn't work because the page can become unavailable 
> between the time when I run the 'try' and the 'else'. [for instance, 
> assume that my internet connection stops for a couple seconds every 
> random amount of time].
>  
> Would anyone know how to solve this problem?
>  
> [Another way to look at it is as follows: I'd like to be able to check 
> whether the page is available AND copy it if it is AT THE SAME TIME]

There are several steps to fetching a web page. A socket connection is 
established, a GET request is sent to the server and the response is 
read from the socket, then the socket is closed. So it is not possible 
to do all this at the same time.

What you can do is wrap the entire operation in a single try/except 
handler. I think you just need something like this:

try:
   f = = urllib.urlopen('http:// <etc>')
   data = f.read()
   f.close()
   # do something with data
except:
   import traceback
   traceback.print_exc()

I put code in the exception handler to print a stack trace, this will 
help figure out where the errors are coming from.

Kent


More information about the Tutor mailing list