How do I correctly download Wikipedia pages?

Taskinoor Hasan taskinoor.hasan at csebuet.org
Wed Nov 25 23:37:39 EST 2009


I fetched a different problem. Whenever I tried to fetch any page from
wikipedia, I received 403. Then I found that wikipedia don't accept the
default user-agent (might be python-urllib2.x or something like this). After
setting my own user-agent, it worked fine. You can try this if you receive
403.

On Thu, Nov 26, 2009 at 10:04 AM, Stephen Hansen <apt.shansen at gmail.com>wrote:

>
>
> 2009/11/25 Steven D'Aprano <steven at remove.this.cybersource.com.au>
>
> I'm trying to scrape a Wikipedia page from Python. Following instructions
>> here:
>>
>>
> Have you checked out http://meta.wikimedia.org/wiki/Pywikipediabot?
>
> Its not just via urllib, but I've scraped several MediaWiki-based sites
> with the software successfully.
>
> --S
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091126/f55348cb/attachment.html>


More information about the Python-list mailing list