[Spambayes] feature request

Ryan Malayter rmalayter at bai.org
Mon Dec 8 09:27:16 EST 2003


> From: Seth Goodman
> Here's an interesting feature request that comes out 
> experimenting with training schemes.  When you are in the 
> unsure folder and hit either of the buttons "Delete As Spam" 
> or "Recover from Spam", it would be great if you would 
> re-filter the unsure folder.  In fact, I would argue that the 
> unsure folder should be re-classified after any training 
> event.  If you want to avoid unnecessary "overtraining" 
> (training on messages whose tokens are already represented in 
> sufficient number in the correct database), one good practice 
> it to manually re-filter the unsure box after each additional 
> message that you train on.  Frequently, training one unsure 
> message as spam will push the scores of other messages in the 
> unsure folder well into the spam range, making it unnecessary 
> to train on them.  Since we probably don't remember to do 
> this all the time (I sure don't), we wind up training on 
> messages that would now classify properly, thus increasing 
> the size of the (usually spam) database unnecessarily.  Since 
> many knowledgeable people on this list say that smaller 
> databases seem to be better, which is reasonable, this 
> feature would be an aid to extending the useful life of any 
> particular training set.

If that's the case, the plugin should automatically re-filter all unread
messages in the inbox as well as all messages in the unsure folder upon
each training event. That would insure that any spam that was completely
missed gets caught as well.

Incidentally, when keeping "stats" for testing spam filters, I manually
do this by hand. That way I don't skew the statistics if I don't read my
mail for a week or so.



More information about the Spambayes mailing list