[Tutor] How to Load Every Revised Wikipedia Page Revision

Jostein Berntsen jbernts at broadpark.no
Thu Feb 22 06:12:50 EST 2018


On 19.02.18,15:50, Daniel Bosah wrote:
> Good day,
> 
> I'm doing research for a compsci group. I have a script that is supposed to
> load every revised page of a wikipedia article on FDR.
> 
> This script is supposed to, in while loop
>  access the wikipedia api and using the request library,
> access the api
> if the continue is in the requests
> update the query dict with continue
> keep updating until there are no more 'continue' ( or until the API load
> limit is reached )
> else
> break
> 
> Here is the code:
> 
> 
> 
> def GetRevisions():
>     url = "https://en.wikipedia.org/w/api.php" #gets the api and sets it to
> a variable
>     query = {
>     "format": "json",
>     "action": "query",
>     "titles": "Franklin D. Roosevelt",
>     "prop": "revisions",
>     "rvlimit": 500,
>     }# sets up a dictionary of the arguments of the query
> 
>     while True: # in  a while loop
>         r = requests.get(url, params = query).json() # does a request call
> for the url in the parameters of the query
>         print repr(r) #repr gets the "offical" string output of a object
>         if 'continue' in r: ## while in the loop, if the keyword is in "r"
>             query.update(r['continue']) # updates the dictionary to include
> continue in it, and keeps on printing out all instances of 'continue"
>         else: # else
>            break # quit loop
> 
> 
> 
> I want to load every page version with the revisions of the wikipedia page,
> not just the info about the page revision. How can I go about that?
> 

There are different kinds of Python Wikipedia APIs available. Do you try 
any of these? 

https://pypi.python.org/pypi/wikipedia

http://wikipedia.readthedocs.io/en/latest/code.html#api

https://pypi.python.org/pypi/Wikipedia-API

https://github.com/richardasaurus/wiki-api


Jostein




More information about the Tutor mailing list