[Tutor] Retrieving data from a web site
Phil
phil_lor at bigpond.com
Sun May 19 02:02:54 CEST 2013
On 18/05/13 22:44, Peter Otten wrote:
> You can use a tool like lxml that "understands" html (though in this case
> you'd need a javascript parser on top of that) -- or hack something together
> with string methods or regular expressions. For example:
>
> import urllib2
> import json
>
> s = urllib2.urlopen("http://*********/goldencasket").read()
> s = s.partition("latestResults_productResults")[2].lstrip(" =")
> s = s.partition(";")[0]
> data = json.loads(s)
> lotto = data["GoldLottoSaturday"]
>
> print lotto["drawDayDateNumber"]
> print map(int, lotto["primaryNumbers"])
> print map(int, lotto["secondaryNumbers"])
>
> While this is brittle I've found that doing it "right" is usually not
> worthwhile as it won't survive the next website redesign eighter.
>
> PS: <http://*********/goldencasket/results/download-results>
> has links to zipped csv files with the results. Downloading, inflating and
> reading these should be the simplest and best way to get your data.
Thanks again Peter and Walter,
The results download link points to a historical file of past results
although the latest results are included at the bottom of the file. The
file is quite large and it's zipped so I imagine unzipping would another
problem. I've come across Beautiful Soup and it may also offer a simple
solution.
Thanks for your response Walter, I'd like to download the Australian
Lotto results and there isn't a simple way, as far as I can see, to do
this. I'll read up on curl, maybe I can use it.
I'll experiment with the Peter's code and Beautiful Soup and see what I
can come up with. Maybe unzipping the file could be the best solution,
I'll experiment with that option as well.
--
Regards,
Phil
More information about the Tutor
mailing list