How do I correctly download Wikipedia pages?
kursat.kutlu at gmail.com
Thu Nov 26 04:58:57 CET 2009
Try not to be caught if you send multiple requests :)
Have a look at here: http://wolfprojects.altervista.org/changeua.php
On Nov 26, 5:45 am, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> I'm trying to scrape a Wikipedia page from Python. Following instructions
> I use the URL "http://en.wikipedia.org/wiki/Special:Export/Train" instead
> of just "http://en.wikipedia.org/wiki/Train". But instead of getting the
> page I expect, and can see in my browser, I get an error page:
> >>> import urllib
> >>> url = "http://en.wikipedia.org/wiki/Special:Export/Train"
> >>> print urllib.urlopen(url).read()
> Our servers are currently experiencing a technical problem. This is
> probably temporary and should be fixed soon
> (Output is obviously truncated for your sanity and mine.)
> Is there a trick to downloading from Wikipedia with urllib?
More information about the Python-list