[Spambayes] Filtering unusual words
anthony at interlink.com.au
Tue Feb 25 13:54:16 EST 2003
>>> Bert Ungerer wrote
> Dear Spambayes developers:
> I read the interesting articles in the Linux Journal. If I understood it
> correctly filtering and training of unusual words is critical.
> Most of spam that I receive contains several unique artificial words.
> How do you plan to deal with that kind of spam?
Unique words are generally referred to as "hapaxes" (see the glossary at
http://spambayes.sourceforge.net/docs.html). These are going to be
ignored when the message gets scored - but this will make little or
no difference overall. There's _so_ many other clues in a typical spam
message that it doesn't matter.
There's also the aside that a bunch of these "random words" are actually
not that random. I have a quite strong spam clue in my training data
that's my email address, base64'd. It occurs in a wide variety of spam,
as tracking data.
Anthony Baxter <anthony at interlink.com.au>
It's never too late to have a happy childhood.
More information about the Spambayes