[Tutor] Writing a web bot.
Remco Gerlich
scarblac@pino.selwerd.nl
Sat, 8 Jul 2000 23:49:09 +0200
On Fri, Jul 07, 2000 at 06:29:53PM -0400, Furmanek, Greg wrote:
> Hi all.
>
> It appears I have found myself in a position
> where I could use some help.
>
> The task I am trying to perform is write an
> internet bot. I was going to use urllib for
> this project however one of the requirements
> is for the connection to be continuous during
> the session.
>
> Connect to a site.
> Get page, parse.
> Get another page, parse.
> use POST method, get another page, parse.
> Disconnect from the site.
>
> The connection is not supposed to be dropped
> between the requests.
>
> Is there a simple way to do this task???
I've never needed to do this and I haven't studied urllib. But can't
you change urllib so that it uses an existing connection if it has used
a connection before? Find it out it opens connections, then redefine the
functions or inherit a class in your own module, something like that.
Also, websucker.py and webchecker.py have already been written (they're
in Python's Tools/ directory, maybe you need to download the source
distribution to get them). These tools download a whole site or check links
in a whole site. You probably want something like that. A script to parse
robots.txt files is also included.
But they use seperate connections for each file, I think...
--
Remco Gerlich, scarblac@pino.selwerd.nl