[Spambayes] Train on Hashbusters?

Russ Foster russ_foster at comcast.net
Fri Feb 27 10:57:36 EST 2004

I'm starting to get better than 50% correct classification on world salad 
spam after actively training on these messages. (the other balance of the 
50% show up as strong "unsures).

My thinking in training on all of these messages is that, even though they 
attempt to use only "common" words, they still use words that are not in 
my everyday email.

Also, there appears to be a lot of "clues" in the header that push the 
classification towards the spam end.

Here's a sample of one I got recently that scored "100%" spam:

Spam Score: 100% (0.999371)

word                                spamprob         #ham  #spam
'header:Reply-To:1'                 0.789475            9     60
'absolute'                          0.85899             0      2
'cognate'                           0.85899             0      2
'curricula'                         0.85899             0      2
'implacable'                        0.85899             0      2
'x-mailer:mpop web-mail 2.19'       0.896239            0      3

Since there doesn't seem to be any scientific evidence one way or the 
other, I tag everything. I suppose by database may grow unweildy after a 
while...but, maybe not....?


On Thu, 26 Feb 2004, Tim Stone wrote:

> On Thu, 26 Feb 2004 09:44:29 -0500, Fred Mertz <fred at lucy.com> wrote:
> >
> > I get lots of spam with sections of random words like this:
> <snip>
> > Should I train on these messages?
> I think at the moment our recommendation would be to not train on those 
> messages if they're correctly classified already.  We are actively 
> researching this technique (called "word salad"), but as of yet we've not 
> seen that it is effective against our filter.

