[Tutor] Retrieving data from a web site

Sat May 18 14:44:45 CEST 2013

Phil wrote:

> On 18/05/13 19:25, Peter Otten wrote:
>>
>> Are there alternatives that give the number as plain text?
> 
> Further investigation shows that the numbers are available if I view the
> source of the page. So, all I have to do is parse the page and extract
> the drawn numbers. I'm not sure, at the moment, how I might do that but
> I have something to work with.

You can use a tool like lxml that "understands" html (though in this case 
you'd need a javascript parser on top of that) -- or hack something together 
with string methods or regular expressions. For example:

import urllib2
import json

s = urllib2.urlopen("http://*********/goldencasket").read()
s = s.partition("latestResults_productResults")[2].lstrip(" =")
s = s.partition(";")[0]
data = json.loads(s)
lotto = data["GoldLottoSaturday"]

print lotto["drawDayDateNumber"]
print map(int, lotto["primaryNumbers"])
print map(int, lotto["secondaryNumbers"])

While this is brittle I've found that doing it "right" is usually not 
worthwhile as it won't survive the next website redesign eighter.

PS: <http://*********/goldencasket/results/download-results>
has links to zipped csv files with the results. Downloading, inflating and 
reading these should be the simplest and best way to get your data.