[spambayes-dev] A spectacular false positive
skip at pobox.com
Sat Nov 15 13:56:19 EST 2003
Rob> I am now training on all mistakes and unsures, plus all ham scoring
Rob> more than 0.02 and all spam scoring less than 0.99.
I used to use that sort of scheme as well, but it gets tedious after awhile
and just grows my training database. The problem was that most ham scored
0.0 and after concluding a message was ham I let procmail toss it in the
proper mailbox. This meant that the few hams which didn't score 0.0 were
scattered all over the place, so I had to constantly be on the lookout for
them. I suppose I could have added a copy rule to my procmailrc file to
save all non-zero ham, but that would have just been another mailbox to look
at. I already have unsure, lospam and hispam. That would add hiham.
Also, when you get two of essentially the same spam, do you train on both?
I'm trying to be careful now to minimize that sort of duplication. I have
so many email addresses feeding into skip at mojam.com that I generally get
multiples of everything.
Finally, I also gave up on training on low-scoring spams. If it's spam and
not a mistake, it's good enough for me.
At the moment I have a training database of 133 spams and 111 hams.
More information about the spambayes-dev