[Tutor] Error 403 when accessing wikipedia articles?

Alex Ryu ryu.alex at gmail.com
Sat Oct 27 08:53:01 CEST 2007


Hi all
I'm trying to use python to automatically download and process a (small)
number of wikipedia articles.  However, I keep getting a 403 (Forbidden
Error), when using urllib2:

>>> import urllib2
>>> ip = urllib2.urlopen("http://en.wikipedia.org/wiki/Pythonidae")
which gives this:

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    ip = urllib2.urlopen("http://en.wikipedia.org/wiki/Pythonidae")
  File "G:\Python25\lib\urllib2.py", line 121, in urlopen
    return _opener.open(url, data)
  File "G:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "G:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "G:\Python25\lib\urllib2.py", line 418, in error
    return self._call_chain(*args)
  File "G:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "G:\Python25\lib\urllib2.py", line 499, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden

Now, when I use urllib instead of urllib2, something different happens:

>>> import urllib
>>> ip2 = urllib.urlopen("http://en.wikipedia.org/wiki/Pythonidae")
>>> st = ip2.read()

However, st does not contain the hoped-for page - instead it is a page of
html and (maybe?) javascript, which ends in:

>If reporting this error to the Wikimedia System Administrators, please
include the following >details:<br/>\n<span style="font-style:
>italic">\nRequest: GET
http://en.wikipedia.org/wiki>/Pythonidae<http://en.wikipedia.org/wiki/Pythonidae>,
from 98.195.188.89 via sq27.wikimedia.org (squid/2.6.STABLE13) >to
>()<br/>\nError: ERR_ACCESS_DENIED, errno [No Error] at Sat, 27 Oct 2007
06:45:00 >GMT\n</span>\n</div>\n\n</body>\n</html>\n'

Could anybody tell me what's going on, and what I should be doing
differently?
Thanks for your time
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071026/ca2d70d1/attachment.htm 


More information about the Tutor mailing list