[spambayes-dev] A spectacular false positive

Rob Hooft rob at hooft.net
Sat Nov 15 18:09:13 EST 2003

Skip Montanaro wrote:
>     Rob> I am now training on all mistakes and unsures, plus all ham scoring
>     Rob> more than 0.02 and all spam scoring less than 0.99. 
> I used to use that sort of scheme as well, but it gets tedious after awhile
> and just grows my training database. 
> Also, when you get two of essentially the same spam, do you train on both?
> I'm trying to be careful now to minimize that sort of duplication.  I have
> so many email addresses feeding into skip at mojam.com that I generally get
> multiples of everything.

I do not get a lot of true duplicates, definitely not in the non-obvious 

This is my .procmailrc; it indeed has the copy-rule you mention.

:0 fw:hamlock
| /home/h/hooft/bin/sb_filter.py

# Messages that are so obviously spam that we should not train on them
* ^X-SpamBayes-Classification: spam; 1.00

# Messages that are spam but we might want to train on them
* ^X-SpamBayes-Classification: spam

# Unsure messages must be copied to the unsure folder for training
:0 c
* ^X-SpamBayes-Classification: unsure

# Ham that doesn't score 0.00 is eligible for training as well
:0 c
* ^X-SpamBayes-Classification: ham; 0.0[2-9]

:0 c
* ^X-SpamBayes-Classification: ham; 0.1[0-9]

## Split into folders
* ^List-Id:.*python-announce-list

## Etc.

Rob W.W. Hooft  ||  rob at hooft.net  ||  http://www.hooft.net/people/rob/

More information about the spambayes-dev mailing list