[Spambayes] Filtering unusual words
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Mon Feb 24 10:00:15 EST 2003
2/24/2003 2:46:39 AM, Bert Ungerer <un at ix.de> wrote:
>Dear Spambayes developers:
>I read the interesting articles in the Linux Journal. If I understood it
>correctly filtering and training of unusual words is critical.
>Most of spam that I receive contains several unique artificial words.
>How do you plan to deal with that kind of spam?
The answer here is: It depends... Do the spams you receive contain ONLY
completely unique artificial words? Or are there a few artificial words that
are scattered in amongst regular 'spammy' text? If they contain ONLY words
that are unique to a single instance of spam, and are artificial, then I doubt
that the spam is anything other than meaningless gibberish. In that case,
bayesian filtering is of limited value, since it will only see gibberish words
one and only one time. However, if there are a few gibberish words scattered
in amongst regular spam "buy this..." text, then the remainder of the text
will be useful in determining the spamminess of the message. If the
classifier sees enough words that you've trained it to look for (by your prior
assertions as to what is and is not spam) then it will classify the mail as
spam, regardless of how much other gibberish is in there.
To date, we've not found much that can fool this algorithm with any degree of
certainty and consistency. That's not to say that it's not possible, and we
believe that spammers will begin desperately to try to break this technology,
but it hasn't happened yet. And when it does, we'll adjust.
Thanks for your query. - TimS
c'est moi - TimS
More information about the Spambayes