[Tutor] Retrieving data from a web site

Peter Otten __peter__ at web.de
Sat May 18 14:44:45 CEST 2013


Phil wrote:

> On 18/05/13 19:25, Peter Otten wrote:
>>
>> Are there alternatives that give the number as plain text?
> 
> Further investigation shows that the numbers are available if I view the
> source of the page. So, all I have to do is parse the page and extract
> the drawn numbers. I'm not sure, at the moment, how I might do that but
> I have something to work with.

You can use a tool like lxml that "understands" html (though in this case 
you'd need a javascript parser on top of that) -- or hack something together 
with string methods or regular expressions. For example:

import urllib2
import json

s = urllib2.urlopen("http://*********/goldencasket").read()
s = s.partition("latestResults_productResults")[2].lstrip(" =")
s = s.partition(";")[0]
data = json.loads(s)
lotto = data["GoldLottoSaturday"]

print lotto["drawDayDateNumber"]
print map(int, lotto["primaryNumbers"])
print map(int, lotto["secondaryNumbers"])

While this is brittle I've found that doing it "right" is usually not 
worthwhile as it won't survive the next website redesign eighter.

PS: <http://*********/goldencasket/results/download-results>
has links to zipped csv files with the results. Downloading, inflating and 
reading these should be the simplest and best way to get your data.



More information about the Tutor mailing list