[Spambayes] Filtering unusual words

Anthony Baxter anthony at interlink.com.au
Tue Feb 25 13:54:16 EST 2003

>>> Bert Ungerer wrote
> Dear Spambayes developers:
> I read the interesting articles in the Linux Journal. If I understood it 
> correctly filtering and training of unusual words is critical.
> Most of spam that I receive contains several unique artificial words. 
> How do you plan to deal with that kind of spam?

Unique words are generally referred to as "hapaxes" (see the glossary at
http://spambayes.sourceforge.net/docs.html). These are going to be 
ignored when the message gets scored - but this will make little or 
no difference overall. There's _so_ many other clues in a typical spam
message that it doesn't matter. 

There's also the aside that a bunch of these "random words" are actually
not that random. I have a quite strong spam clue in my training data 
that's my email address, base64'd. It occurs in a wide variety of spam,
as tracking data.


Anthony Baxter     <anthony at interlink.com.au>   
It's never too late to have a happy childhood.

More information about the Spambayes mailing list