Makin search on the other site and getting data and writing in xml
Paul Boddie
paul at boddie.org.uk
Wed Sep 27 12:45:59 EDT 2006
George Sakkis wrote:
> altemurbugra at gmail.com wrote:
>
> > I dont mean google
> > i dont mean onelook.com
> >
> > these are only examples
> >
> > i hop eyou understand what i mean
>
> Apparently, *you* don't understand what they're trying to tell you. It
> roughly boils down to the following:
If we just step back from the brink for a moment and give the
questioner the benefit of the doubt - that the exercise merely involves
automating some kind of interactions that would otherwise require lots
of manual messing around piloting a browser, rather than performing
some kind of bulk "suck down" of an entire site's information - then it
is obviously possible to use the following techniques:
* Use a well-known mirroring or archiving tool such as wget.
* Use various testing tools, some of which are written in Python.
* Use urllib, urllib2 or httplib plus an HTML or XML parser in your
own program.
* Automate a Web browser using some off-the-shelf program.
* Use various automation mechanisms provided by your environment
(eg. COM, DCOP), possibly with Python libraries (eg. PAMIE [1],
KPart Plugins [2]).
Various sites forbid wget and friends as a rule, understandably, but
there are sometimes reasons why you might want to use various tools to
automate a procedure involving lots of data which would waste a huge
amount of time if done manually. Perhaps you might have mail residing
in a Webmail system which can't be extracted via any process other than
reading all the messages in a browser, for example, or perhaps your
favourite Internet applications don't provide decent shortcuts to the
information you need, instead believing that it's all about the
"experience": surfing around watching all the animated adverts.
Automation and related technologies can legitimately help users regain
control of their Internet-resident data and make better use of the
services around it.
Paul
[1] http://pamie.sourceforge.net/
[2] http://www.boddie.org.uk/python/kpartplugins.html
More information about the Python-list
mailing list