[spambayes-dev] A spectacular false positive
rob at hooft.net
Sat Nov 15 18:09:13 EST 2003
Skip Montanaro wrote:
> Rob> I am now training on all mistakes and unsures, plus all ham scoring
> Rob> more than 0.02 and all spam scoring less than 0.99.
> I used to use that sort of scheme as well, but it gets tedious after awhile
> and just grows my training database.
> Also, when you get two of essentially the same spam, do you train on both?
> I'm trying to be careful now to minimize that sort of duplication. I have
> so many email addresses feeding into skip at mojam.com that I generally get
> multiples of everything.
I do not get a lot of true duplicates, definitely not in the non-obvious
This is my .procmailrc; it indeed has the copy-rule you mention.
# Messages that are so obviously spam that we should not train on them
* ^X-SpamBayes-Classification: spam; 1.00
# Messages that are spam but we might want to train on them
* ^X-SpamBayes-Classification: spam
# Unsure messages must be copied to the unsure folder for training
* ^X-SpamBayes-Classification: unsure
# Ham that doesn't score 0.00 is eligible for training as well
* ^X-SpamBayes-Classification: ham; 0.0[2-9]
* ^X-SpamBayes-Classification: ham; 0.1[0-9]
## Split into folders
Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/
More information about the spambayes-dev