[Tutor] open a webpage which may be unavailable
Kent Johnson
kent37 at tds.net
Thu Oct 18 13:35:19 CEST 2007
pileux systeme wrote:
> Hello,
>
> I am trying to retrieve data from several webpages. My problem is the
> following: after a random number of requests, the page I'm trying to
> open is unavailable (and I get an IOError). Note that the page may
> become available if I try again after some time. Since I have thousands
> pages to explore, I'd like to be able to continue the program in spite
> of this error.
> I've thought of trying to raise an exception such as:
> try:
> usock = urllib.urlopen('http:// <etc>')
> except:
> < something >
> else:
> usock = urllib.urlopen('http:// <etc>')
I'm confused about the code above. First, it is catching an exception,
not raising it. Second, the 'else' clause, which will run if there is no
exception, seems to do the same thing as the 'try' clause; I don't
understand you intent.
> However, this doesn't work because the page can become unavailable
> between the time when I run the 'try' and the 'else'. [for instance,
> assume that my internet connection stops for a couple seconds every
> random amount of time].
>
> Would anyone know how to solve this problem?
>
> [Another way to look at it is as follows: I'd like to be able to check
> whether the page is available AND copy it if it is AT THE SAME TIME]
There are several steps to fetching a web page. A socket connection is
established, a GET request is sent to the server and the response is
read from the socket, then the socket is closed. So it is not possible
to do all this at the same time.
What you can do is wrap the entire operation in a single try/except
handler. I think you just need something like this:
try:
f = = urllib.urlopen('http:// <etc>')
data = f.read()
f.close()
# do something with data
except:
import traceback
traceback.print_exc()
I put code in the exception handler to print a stack trace, this will
help figure out where the errors are coming from.
Kent
More information about the Tutor
mailing list