[spambayes-dev] race conditions in imap filtering?

Tue Sep 9 15:38:44 EDT 2003

The other day I started reading my mail from an IMAP server, so I 
started using sb_imapfilter.py to filter my mail.  Looking at the code I 
have to wonder how safe it is from corrupting my database and my e-mail.

The problem is this.  There is an option -c to classify mail on the IMAP 
server, and there is an option -l <minutes> to loop and classify every 
<minutes>.  When the filter starts up it opens the database, and it 
never closes it, and it never re-reads it.  If I want to train on some 
messages I could do that while the imapfilter is still running, but then 
it won't pick up the new ham and spam count although I suppose it will 
pick up the new word counts--not everything is in-core, and so the 
results will be off.

If, however, I stop the filter before training, I can only do this by 
interrupting it, and then there is the chance that it is in the middle 
of classifying my e-mail.  I don't know whether that can corrupt the 
e-mail (it does make changes such as adding headers).

The first problem could be resolved by opening the database at the 
beginning of each iteration.  This doesn't seem too dificult.

The second problem is harder, but a solution could be to use locking of 
the database, so that while training is in progress the filter doesn't 
classify.

Opinions?

Do the same problems occur in the pop proxy?