How do I correctly download Wikipedia pages?

ShoqulKutlu kursat.kutlu at gmail.com
Wed Nov 25 22:58:57 EST 2009


Hi,

Try not to be caught if you send multiple requests :)

Have a look at here: http://wolfprojects.altervista.org/changeua.php

Regards
Kutlu

On Nov 26, 5:45 am, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> I'm trying to scrape a Wikipedia page from Python. Following instructions
> here:
>
> http://en.wikipedia.org/wiki/Wikipedia:Database_downloadhttp://en.wikipedia.org/wiki/Special:Export
>
> I use the URL "http://en.wikipedia.org/wiki/Special:Export/Train" instead
> of just "http://en.wikipedia.org/wiki/Train". But instead of getting the
> page I expect, and can see in my browser, I get an error page:
>
> >>> import urllib
> >>> url = "http://en.wikipedia.org/wiki/Special:Export/Train"
> >>> print urllib.urlopen(url).read()
>
> ...
> Our servers are currently experiencing a technical problem. This is
> probably temporary and should be fixed soon
> ...
>
> (Output is obviously truncated for your sanity and mine.)
>
> Is there a trick to downloading from Wikipedia with urllib?
>
> --
> Steven




More information about the Python-list mailing list