[Spambayes] Train on Hashbusters?
russ_foster at comcast.net
Fri Feb 27 10:57:36 EST 2004
I'm starting to get better than 50% correct classification on world salad
spam after actively training on these messages. (the other balance of the
50% show up as strong "unsures).
My thinking in training on all of these messages is that, even though they
attempt to use only "common" words, they still use words that are not in
my everyday email.
Also, there appears to be a lot of "clues" in the header that push the
classification towards the spam end.
Here's a sample of one I got recently that scored "100%" spam:
Spam Score: 100% (0.999371)
word spamprob #ham #spam
'header:Reply-To:1' 0.789475 9 60
'absolute' 0.85899 0 2
'cognate' 0.85899 0 2
'curricula' 0.85899 0 2
'implacable' 0.85899 0 2
'x-mailer:mpop web-mail 2.19' 0.896239 0 3
Since there doesn't seem to be any scientific evidence one way or the
other, I tag everything. I suppose by database may grow unweildy after a
while...but, maybe not....?
On Thu, 26 Feb 2004, Tim Stone wrote:
> On Thu, 26 Feb 2004 09:44:29 -0500, Fred Mertz <fred at lucy.com> wrote:
> > I get lots of spam with sections of random words like this:
> > Should I train on these messages?
> I think at the moment our recommendation would be to not train on those
> messages if they're correctly classified already. We are actively
> researching this technique (called "word salad"), but as of yet we've not
> seen that it is effective against our filter.
More information about the Spambayes