[Tutor] fetching wikipedia articles
Andre Engels
andreengels at gmail.com
Fri Jan 23 11:25:43 CET 2009
On Fri, Jan 23, 2009 at 10:37 AM, amit sethi <amit.pureenergy at gmail.com> wrote:
> so is there a way around that problem ??
Ok, I have done some checking around, and it seems that the Wikipedia
server is giving a return code of 403 (forbidden), but still giving
the page - which I think is weird behaviour. I will check with the
developers of Wikimedia why this is done, but for now you can resolve
this by editing robotparser.py in the following way:
In the __init__ of the class URLopener, add the following at the end:
self.addheaders = [header for header in self.addheaders if header[0]
!= "User-Agent"] + [('User-Agent', '<whatever>')]
(probably
self.addheaders = [('User-Agent', '<whatever>')]
does the same, but my version is more secure)
--
André Engels, andreengels at gmail.com
More information about the Tutor
mailing list