[Tutor] FETCH URLs FROM WEBSITE
Alan Gauld
alan.gauld at btinternet.com
Sat Aug 1 19:42:14 CEST 2015
On 01/08/15 11:48, Gaurav Lathwal wrote:
> I want to write a script that automatically downloads all the videos hosted
> on this site :-
>
> http://www.toonova.com/batman-beyond
The first thing to ask is whether they allow robotic downloads
from the site. If they are funded by advertising then they may
not permit it and it would be self defeating to try since you
would be helping close down your source!
> Now, the problem I am having is, I am unable to fetch the video urls of all
> the videos.
I assume you want to fetch the videos not just the URLs?
Fetching the URLs is easy enough and I doubt the site would object
too strongly. But fetching the videos is much harder since:
a) The page you give only has links to separate pages for each
video.
b) The separate pages have a download link which is to a
tiny url which may well change.
c) The separate page is not static HTML (or even server
generated HTML) it is created in part by the Javascript
code when the page loads. That means it is very likely to
change on each load (possibly deliberately so to foil robots!)
> I mean I can manually fetch the video urls using the chrome developer's
> console, but it's too time consuming.
> Is there any way to just fetch all the video urls using BeautifulSoup ?
It's probably possible for a one-off, but it may not work reliably for
future use. Assuming the site allows it in the first place.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list