[Tutor] Error 403 when accessing wikipedia articles?

Sat Oct 27 10:19:21 CEST 2007

"Alex Ryu" <ryu.alex at gmail.com> wrote

> I'm trying to use python to automatically download and process a 
> (small)
> number of wikipedia articles.  However, I keep getting a 403 
> (Forbidden
> Error), when using urllib2:

FWIW I had a similar problem in trying to use Google to illustrate
the use of urlib2 in my tutorial. It seems some wevb sites implement
measures to prevent robotic access. I assume you could spoof your
browsers characteristics and fool the system but I tend to take the
view that if the owner doesn't like robots then I'd better respect
that, so I haven't tried.

All of which reminds me that I really need to finish writing
that topic! :-)

>  File "G:\Python25\lib\urllib2.py", line 499, in http_error_default
>    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
> HTTPError: HTTP Error 403: Forbidden
>
> Now, when I use urllib instead of urllib2, something different 
> happens:
>
> from 98.195.188.89 via sq27.wikimedia.org (squid/2.6.STABLE13) >to
>>()<br/>\nError: ERR_ACCESS_DENIED, errno [No Error] at Sat, 27 Oct 
>>2007

HTH,

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld