[Tutor] fetching wikipedia articles

Andre Engels andreengels at gmail.com
Thu Jan 22 18:15:09 CET 2009


On Thu, Jan 22, 2009 at 6:08 PM, amit sethi <amit.pureenergy at gmail.com> wrote:
> hi , I need help as to how i can fetch a wikipedia article i tried changing
> my user agent but it did not work . Although as far as my knowledge of
> robots.txt goes , looking at en.wikipedia.org/robots.txt it does not seem it
> should block a useragent (*, which is what i would normally use) from
> accesing a simple article like say
> "http://en.wikipedia.org/wiki/Sachin_Tendulkar" but still robotparser
> returns false
> status=rp.can_fetch("*", "http://en.wikipedia.org/wiki/Sachin_Tendulkar")
> where rp is a robot parser object . why is that?

Yes, Wikipedia is blocking the Python default user agent. This was
done to block the main internal bot in its early days (it was
misbehaving by getting each page twice); when it got to allowing the
bot again, it had already changed to having its own user agent string,
and apparently it was not deemed necessary to unblock the user
string...




--
André Engels, andreengels at gmail.com


More information about the Tutor mailing list