[Tutor] Creating a webcrawler

Steven D'Aprano steve at pearwood.info
Sat Jan 9 09:36:44 EST 2016


On Sat, Jan 09, 2016 at 12:01:35PM +1000, Whom Isac wrote:
> Hi I want to create a web-crawler but dont have any lead to choose any
> module. I have came across the Jsoup but I am not familiar with how to use
> it in 3.5 as I tried looking at a similar web crawler codes from 3.4 dev
> version.
> I just want to build that crawler to crawl through a javascript enable site
> and automatically detect a download link (for video file)
> .

I admire your enthusiasm, but you have set yourself a HUGELY complicated 
project.

If you just want to extract some videos, you might find this 
existing tool (written in Python!) helpful:

http://rg3.github.io/youtube-dl/



> And should I be using pickles to write the data in the text file/ save file.

No.



-- 
Steve


More information about the Tutor mailing list