[Tutor] Retrieving data from a web site

Sat May 18 11:25:03 CEST 2013

Phil wrote:

> On 18/05/13 16:33, Alan Gauld wrote:
>> On 18/05/13 00:57, Phil wrote:
>>> I'd like to "download" eight digits from a web site where the digits are
>>> stored as individual graphics. Is this possible, using perhaps, one of
>>> the countless number of Python modules? Is this the function of a web
>>> scraper?
>>
>> In addition to Dave's points there is also the legality to consider.
>> Images are often copyrighted (although images of digits are less
>> likely!) and sites often have conditions of use that prohibit web
>> scraping. Such sites often include scripts that analyze user activity
>> and if they suspect you of being a robot may ban your computer from
>> accessing the site - including by browser.
>>
>> So be sure that you  are allowed to access the site robotically and that
>> you are allowed to download the content or you could find yourself
>> blacklisted and unable to access the site even with your browser.
>>
> 
> Thanks for the replies,
> 
> The site in question is the Lotto results page and the drawn numbers are
> not obscured. So I don't expect that there would be any legal or
> copyright problems.
> 
> I have written a simple program that checks the results, for an unlikely
> win, but I have to manually enter the drawn numbers. I thought the next
> step might be to automatically download the results.
> 
> I can see that this would be a relatively easy task if the digits were
> not displayed as graphics.

What's the url of the page? 

Are there alternatives that give the number as plain text? 

If not, do the images have names like whatever0.jpg, whatever1.jpg, 
whatever2.jpg, ...? Then you could infer the value from the name. 

If not, is a digit always represented by the same image? Then you could map 
the image urls to the digits.