Web Spider

Thomas Lindgaard thomas at it-snedkeren.BLACK_HOLE.dk
Wed Jul 7 11:28:08 CEST 2004

On Tue, 06 Jul 2004 11:19:01 -0400, Peter Hansen wrote:

> Answered indirectly in this FAQ:
> http://www.python.org/doc/faq/programming.html#how-do-i-find-the-current-module-name

Let me just see if I understood this correctly...

The reason for using the construct is to have to "modes" for the script:
One for running the script by itself (ie. run main()) and one for when it
is included from somewhere else (ie. main() should not be run unless
called from the surrounding code).

>> 2) In Retrievepool.__init__ the Retriever.__init__ is called with
>> self.inputQueue and self.outputQueue as arguments. Does this mean that
>> each Retriever thread has a reference to Retrievepool.inputQueue and
>> Retrievepool.outputQueue
> Yes, and that's sort of the whole point of the thing.

Okidoki :)
>> 3) How many threads will be running? Spider.run initializes the
>> Retrievepool and this will consist of MAX_THREADS threads, so once the
>> crawler is running there will be the main thread (caught in the while
>> loop in Spider.run) and MAX_THREADS Retriever threads running, right?
> Yep.  Good analysis. :-)  You could inject this somewhere to check:

Thanks - sometimes it actually helps to read code you want to elaborate on
closely :)

> print len(threading.enumerate()), 'threads exist'

Can a thread die spontaneously if for instance an exception is thrown?


More information about the Python-list mailing list