[Tutor] How to Load Every Revised Wikipedia Page Revision
Jostein Berntsen
jbernts at broadpark.no
Thu Feb 22 06:12:50 EST 2018
On 19.02.18,15:50, Daniel Bosah wrote:
> Good day,
>
> I'm doing research for a compsci group. I have a script that is supposed to
> load every revised page of a wikipedia article on FDR.
>
> This script is supposed to, in while loop
> access the wikipedia api and using the request library,
> access the api
> if the continue is in the requests
> update the query dict with continue
> keep updating until there are no more 'continue' ( or until the API load
> limit is reached )
> else
> break
>
> Here is the code:
>
>
>
> def GetRevisions():
> url = "https://en.wikipedia.org/w/api.php" #gets the api and sets it to
> a variable
> query = {
> "format": "json",
> "action": "query",
> "titles": "Franklin D. Roosevelt",
> "prop": "revisions",
> "rvlimit": 500,
> }# sets up a dictionary of the arguments of the query
>
> while True: # in a while loop
> r = requests.get(url, params = query).json() # does a request call
> for the url in the parameters of the query
> print repr(r) #repr gets the "offical" string output of a object
> if 'continue' in r: ## while in the loop, if the keyword is in "r"
> query.update(r['continue']) # updates the dictionary to include
> continue in it, and keeps on printing out all instances of 'continue"
> else: # else
> break # quit loop
>
>
>
> I want to load every page version with the revisions of the wikipedia page,
> not just the info about the page revision. How can I go about that?
>
There are different kinds of Python Wikipedia APIs available. Do you try
any of these?
https://pypi.python.org/pypi/wikipedia
http://wikipedia.readthedocs.io/en/latest/code.html#api
https://pypi.python.org/pypi/Wikipedia-API
https://github.com/richardasaurus/wiki-api
Jostein
More information about the Tutor
mailing list