[Tutor] fetching wikipedia articles

Andre Engels andreengels at gmail.com
Fri Jan 23 11:25:43 CET 2009


On Fri, Jan 23, 2009 at 10:37 AM, amit sethi <amit.pureenergy at gmail.com> wrote:
> so is there a way around that problem ??

Ok, I have done some checking around, and it seems that the Wikipedia
server is giving a return code of 403 (forbidden), but still giving
the page - which I think is weird behaviour. I will check with the
developers of Wikimedia why this is done, but for now you can resolve
this by editing robotparser.py in the following way:

In the __init__ of the class URLopener, add the following at the end:

self.addheaders = [header for header in self.addheaders if header[0]
!= "User-Agent"] + [('User-Agent', '<whatever>')]

(probably

self.addheaders = [('User-Agent', '<whatever>')]

does the same, but my version is more secure)

-- 
André Engels, andreengels at gmail.com


More information about the Tutor mailing list