[Tutor] Retrieving data from a web site

Phil phil_lor at bigpond.com
Sun May 19 02:02:54 CEST 2013


On 18/05/13 22:44, Peter Otten wrote:
> You can use a tool like lxml that "understands" html (though in this case
> you'd need a javascript parser on top of that) -- or hack something together
> with string methods or regular expressions. For example:
>
> import urllib2
> import json
>
> s = urllib2.urlopen("http://*********/goldencasket").read()
> s = s.partition("latestResults_productResults")[2].lstrip(" =")
> s = s.partition(";")[0]
> data = json.loads(s)
> lotto = data["GoldLottoSaturday"]
>
> print lotto["drawDayDateNumber"]
> print map(int, lotto["primaryNumbers"])
> print map(int, lotto["secondaryNumbers"])
>
> While this is brittle I've found that doing it "right" is usually not
> worthwhile as it won't survive the next website redesign eighter.
>
> PS: <http://*********/goldencasket/results/download-results>
> has links to zipped csv files with the results. Downloading, inflating and
> reading these should be the simplest and best way to get your data.

Thanks again Peter and Walter,

The results download link points to a historical file of past results 
although the latest results are included at the bottom of the file. The 
file is quite large and it's zipped so I imagine unzipping would another 
problem. I've come across Beautiful Soup and it may also offer a simple 
solution.

Thanks for your response Walter, I'd like to download the Australian 
Lotto results and there isn't a simple way, as far as I can see, to do 
this. I'll read up on curl, maybe I can use it.

I'll experiment with the Peter's code and Beautiful Soup and see what I 
can come up with. Maybe unzipping the file could be the best solution, 
I'll experiment with that option as well.

-- 
Regards,
Phil


More information about the Tutor mailing list