urlopen returns forbidden

Chris Rebert clp2 at rebertia.com
Mon Feb 28 07:19:18 CET 2011

On Sun, Feb 27, 2011 at 9:38 PM, monkeys paw <monkey at joemoney.net> wrote:
> I have a working urlopen routine which opens
> a url, parses it for <a> tags and prints out
> the links in the page. On some sites, wikipedia for
> instance, i get a
> HTTP error 403, forbidden.
> What is the difference in accessing the site through a web browser
> and opening/reading the URL with python urllib2.urlopen?

The User-Agent header (http://en.wikipedia.org/wiki/User_agent ).
"By default, the URLopener class sends a User-Agent header of
urllib/VVV, where VVV is the urllib version number."
    – http://docs.python.org/library/urllib.html

Some sites block obvious non-search-engine bots based on their HTTP
User-Agent header value.

You can override the urllib default:

Sidenote: Wikipedia has a proper API for programmatic browsing, likely
hence why it's blocking your program.


More information about the Python-list mailing list