internet searching program
Michael Tobis
mtobis at gmail.com
Sat Aug 9 12:07:25 EDT 2008
I think you are talking about "screen scraping".
Your program can get the html for the page, and search for an
appropriate pattern.
Look at the source for a YouTube view page and you will see a string
var embedUrl = 'http://....
You can write code to search for that in the html text.
But embedUrl is not a standard javascript item; it is part of the
YouTube design. You will be relying on that, and if YouTube changes
how they provide such a string, you will be out of luck; at best you
will have to rewrite part of your code.
In other words, screen scraping relies on the site not doing a major
reworking and on you doing a little reverse engineering of the page
source.
So how to get the html into your program? In python the answer to that
is urllib.
http://docs.python.org/lib/module-urllib.html
mt
More information about the Python-list
mailing list