404 errors
Ivan Karajas
my_full_name_concatenated at myrealbox.com
Thu Apr 29 03:03:54 EDT 2004
On Tue, 27 Apr 2004 10:46:47 +0200, Tut wrote:
> Tue, 27 Apr 2004 11:00:57 +0800, Derek Fountain wrote:
>
>> Some servers respond with a nicely formatted bit of HTML explaining the
>> problem, which is fine for a human, but not for a script. Is there some
>> flag or something definitive on the response which says "this is a 404
>> error"?
>
> Maybe catch the urllib2.HTTPError?
This kind of answers the question. urllib will let you read whatever it
receives, regardless of the HTTP status; you need to use urllib2 if you
want to find out the status code when a request results in an error (any
HTTP status beginning with a 4 or 5). This can be done like so:
import urllib2
try:
asock = urllib2.urlopen("http://www.foo.com/qwerty.html")
except urllib2.HTTPError, e:
print e.code
The value in urllib2.HTTPError.code comes from the first line of the web
server's HTTP response, just before the headers begin, e.g. "HTTP/1.1 200
OK", or "HTTP/1.1 404 Not Found".
One thing you need to be aware of is that some web sites don't behave as
you would expect them to; e.g. responding with a redirection rather than a
404 error when you when you request a page that doesn't exist. In these
cases you might still have to rely on some clever scripting.
Cheers,
Ivan
More information about the Python-list
mailing list