[Tutor] Forbidden HTML, or not?

Barry Sperling barry at angleinc.com
Wed Nov 3 18:07:55 CET 2004


I used urlopen to get the HTML of a webpage on the net and it worked for 
the first few that I tried but not for Google News:

http://news.google.com/nwshp?hl=en&gl=us

This site gave me an error in the Interactive Window section of 
PythonWin, the last part of which was:

line 306, in _call_chain
     result = func(*args)
   File "D:\Python23\lib\urllib2.py", line 412, in http_error_default
     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden

So I interpreted that to mean that some flag was set on the site to 
prevent reading of the HTML.  However, in Mozilla, when I read the page 
source with ctrl-U the HTML did show up in its entirety.

What do I need to do to get the same result with my code, part of which 
is below:

import urllib2   # OPENING AND READING HTML
import re        # SEARCHING THE HTML

# THIS GIVES "FORBIDDEN" WITH AND WITHOUT THE "?hl=en&gl=us" ATTACHED
source = urllib2.urlopen('http://news.google.com/nwshp')

html_text = source.read()

	Barry




More information about the Tutor mailing list