[Spambayes] Filtering unusual words

Mon Feb 24 10:00:15 EST 2003

2/24/2003 2:46:39 AM, Bert Ungerer <un at ix.de> wrote:

>Dear Spambayes developers:
>
>I read the interesting articles in the Linux Journal. If I understood it 
>correctly filtering and training of unusual words is critical.
>
>Most of spam that I receive contains several unique artificial words. 
>How do you plan to deal with that kind of spam?

The answer here is: It depends...  Do the spams you receive contain ONLY 
completely unique artificial words?  Or are there a few artificial words that 
are scattered in amongst regular 'spammy' text?  If they contain ONLY words 
that are unique to a single instance of spam, and are artificial, then I doubt 
that the spam is anything other than meaningless gibberish.  In that case, 
bayesian filtering is of limited value, since it will only see gibberish words 
one and only one time.  However, if there are a few gibberish words scattered 
in amongst regular spam "buy this..." text, then the remainder of the text 
will be useful in determining the spamminess of the message.  If the 
classifier sees enough words that you've trained it to look for (by your prior 
assertions as to what is and is not spam) then it will classify the mail as 
spam, regardless of how much other gibberish is in there.

To date, we've not found much that can fool this algorithm with any degree of 
certainty and consistency.  That's not to say that it's not possible, and we 
believe that spammers will begin desperately to try to break this technology, 
but it hasn't happened yet.  And when it does, we'll adjust.

Thanks for your query.  - TimS
>
>Kind regards
>
c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org