[Tutor] FETCH URLs FROM WEBSITE

Sat Aug 1 19:42:14 CEST 2015

On 01/08/15 11:48, Gaurav Lathwal wrote:

> I want to write a script that automatically downloads all the videos hosted
> on this site :-
>
> http://www.toonova.com/batman-beyond

The first thing to ask is whether they allow robotic downloads
from the site. If they are funded by advertising then they may
not permit it and it would be self defeating to try since you
would be helping close down your source!

> Now, the problem I am having is, I am unable to fetch the video urls of all
> the videos.

I assume you want to fetch the videos not just the URLs?
Fetching the URLs is easy enough and I doubt the site would object
too strongly. But fetching the videos is much harder since:

a) The page you give only has links to separate pages for each
    video.
b) The separate pages have a download link which is to a
    tiny url which may well change.
c) The separate page is not static HTML (or even server
    generated HTML) it is created in part by the Javascript
    code when the page loads. That means it is very likely to
    change on each load (possibly deliberately so to foil robots!)

> I mean I can manually fetch the video urls using the chrome developer's
> console, but it's too time consuming.
> Is there any way to just fetch all the video urls using BeautifulSoup ?

It's probably possible for a one-off, but it may not work reliably for 
future use. Assuming the site allows it in the first place.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos