[Tutor] Connection/Analyis Order

Jeff Shannon jeff@ccvcorp.com
Tue, 04 Jun 2002 09:14:36 -0700


Russell Bungay wrote:

> Given my current status, I tend to favour 1, but over dial up would 2 or
> 3 make more sense?

Personally, I'd be inclined to use #2 in any case.  It depends not only on
your connection to your mailserver, but also to the connection to the
webserver that you're publishing your data on.  I'd think that you will
probably want to update the web page only once for each run of your script.
To my mind, the simplest way to do this is stepwise -- first grab all data,
then analyze all data, then publish the results.

If you *do* decide to use option #3 and go with threads, you will
*definately* want to look into the Queue module.  Queue does exactly what
you'd need for passing chunks of data from one thread to another, safely.
Your producer thread(s) add data to the Queue, your consumer thread(s) pull
data from the Queue, and the Queue itself will make sure that nobody steps
on each other's toes.  You could also have a separate "publisher" thread --
once your data has been analyzed by your parser thread, the results can be
put in another Queue to be pulled out by the publisher thread.  That thread
could accumulate results from multiple messages, and when the Queue is
empty, update your web page.  Or perhaps the publisher should wait for the
Queue to be empty for longer than some time limit, such as half a second, to
allow for some slight delays in the reading/parsing.


> I have never done any thread programming before (though Prog. Python
> makes it look easy enough), but I am concerned about different threads
> accessing the storage of the data at incompatible times and mucking it
> up and similar problems connecting to the mail server (I want to delete
> successfully analysed mails.)

Thread programming isn't *too* complicated if you're careful... ;)  If you
use a Queue to manage any data passed between threads, you'll probably be in
good shape.  Just be sure to think about all the possible conditions of "if
*this* thread does X before or after *that* thread does Y, what will the
effect be?"  It's easy to assume a particular order of occurrences, but with
threading those assumptions are dangerous -- if the order is important, then
you *must* enforce it with thread synchronization techniques (of which
Queues are perhaps the easiest...)

And, while I haven't done anything with IMAP mailservers, using poplib to
access POP3 accounts *is* indeed dead easy.  :)  (I sometimes check my home
email while I'm at work, by firing up the Python interpreter and
interactively poking through it with poplib...)

Jeff Shannon
Technician/Programmer
Credit International