[Spambayes] Re: CRM114 in November breaks 99.9%. :-)

Ken Anderson kanderson@bbn.com
Mon Dec 2 16:04:30 2002


The "train only on errors" bothers me.  Can you say what you use for a training set and what you use for a test set? 

At 09:44 AM 12/2/2002, Bill Yerazunis wrote:

>Final test statistics for CRM114 for November are in:
>
>Standard rules apply (no whitelists, no blacklists, realtime email stream
>only (no "canned spam"), train only on errors, polynomial length 5)
>
>        For All of November (starting 9 AM Nov 1, ending 9 AM Dec 1)
>
>    Spams  Nonspams   False     False    Total    N+1 Accuracy       NHC's
>                     Accepts   Rejects   Emails  
>      1993   3914       4         0       5911     99.915             2
>
>    Spam features in hash tables:    398K
>    Nonspam features in hash tables: 299K
>
>There was just 1 spam that got through in the last week of November-
>a very strange spam written in mixed English and Czech trying to sell
>me diesel engine parts.  It came through on a moto-head email list,
>which I suppose might be slightly topical, and it certainly was amusing,
>rather reminiscent of the Monty Python "camshaft smuggling" skit,
>but it's still spam and counts as such.
>
>This gives an N+1 accuracy of > 99.9% for the entire month of November.
>(99.932% for N-accuracy).  
>
>So, CRM114 barely squeaked through the month at >99.9%.  Barely.  There's
>clearly still work to be done (the spambayes mailing list is kicking
>around the proper way to evaluate probabilities; I'm looking into some
>of their ideas as well.)
>
>
>
>--- On The Other Hand (the bad news)---
>
>December is looking much worse - TWO have gotten through already over
>the weekend (one "barnyard teen" pornspam- it hasn't seen that before)
>and one very short mortgage solicitation, written folksy-style.
>
>I'm also getting mailer errors now out of Sendmail whenever I do
>a "learn"; I'm starting to think that our systems people have
>upgraded something and broken something else in the process. This 
>throws some question onto whether the CRM114 training code is actually 
>getting run at all, or whether the increasing spam rate is
>symptomatic of the evolution of spam against static filters.
>
>       -Bill Yerazunis




More information about the Spambayes mailing list