[Spambayes] Retraining

Tony Meyer tameyer at ihug.co.nz
Thu Feb 26 18:58:55 EST 2004


> I'm seeing a fair number of relatively targeted "random" 
> words that are helping to get quite a few messages just under 
> the spam threshold.  If a spammer is harvesting email 
> addresses from a mailing list, especially a technical one, 
> this technique is particulary easy - and dare I say, 
> particularly effective.  They can even throw words back to 
> you from one of your own postings.

Certainly the more work you put into it (or into writing your spamware), the
more effective it will be.  If you sent me a spam message and put a copy of
this message at the end of it (or put the results of a "I'm feeling lucky"
google for "Tony Meyer"), then I would imagine it would probably end up as
an unsure (for me), if not ham.  OTOH, the more work that's involved, the
less economical the spam is - it's certainly much quicker and easier to grab
some dictionary words than look for something specific to the target.

> >Finally there's the chance that the word is ham (appears 
> >next in ham).  
> >So it's no big deal, and may even help classification.
> 
> I'm not sure I understand how classifying ham words as spam 
> can have any possible benefit...

That's not what I meant (I may have written it wrongly - I'd have to go back
and read it; I probably wrote it poorly).  What I meant is that there are
three possibilities - (1) the word has never been seen, or never will be
seen again, (2) the word is one that has only been seen in that type of
message, or only will be seen in that type of message, and (3) the word is
one that is seen in the other type of message.

For example: (1) "sdhjkfdsdsf8435hjks" will probably never been seen again.
(2)  "fat" as a random word in a spam message, could well be a word that
only appears in spam (for me).  (3)  "analysis" as a random word in a spam
message, could easily appear in ham or spam (for me).

=Tony Meyer




More information about the Spambayes mailing list