[Spambayes] CRM114 in November breaks 99.9%. :-)

Bill Yerazunis wsy@merl.com
Mon Dec 2 14:51:33 2002


Ooops, messed up the spreadsheet... corrected statistics below:

Even-More-Final test statistics for CRM114 for November are in:

Standard rules apply (no whitelists, no blacklists, realtime email stream
only (no "canned spam"), train only on errors, polynomial length 5)

	 For All of November (starting 9 AM Nov 1, ending 9 AM Dec 1)

    Spams  Nonspams   False     False    Total    N+1 Accuracy       NHC's
                     Accepts   Rejects   Emails  
      1931   3914       4         0       5849     99.914             2

    Spam features in hash tables:    398K
    Nonspam features in hash tables: 299K

There was just 1 spam that got through in the last week of November-
a very strange spam written in mixed English and Czech trying to sell
me diesel engine parts.  It came through on a moto-head email list,
which I suppose might be slightly topical, and it certainly was amusing,
rather reminiscent of the Monty Python "camshaft smuggling" skit,
but it's still spam and counts as such.

This gives an N+1 accuracy of > 99.9% for the entire month of November.
(99.932% for N-accuracy).  

So, CRM114 barely squeaked through the month at >99.9%.  Barely.  There's
clearly still work to be done (the spambayes mailing list is kicking
around the proper way to evaluate probabilities; I'm looking into some
of their ideas as well.)



--- On The Other Hand (the bad news)---

December is looking much worse - TWO have gotten through already over
the weekend (one "barnyard teen" pornspam- it hasn't seen that before)
and one very short mortgage solicitation, written folksy-style.

I'm also getting mailer errors now out of Sendmail whenever I do
a "learn"; I'm starting to think that our systems people have
upgraded something and broken something else in the process. This 
throws some question onto whether the CRM114 training code is actually 
getting run at all, or whether the increasing spam rate is
symptomatic of the evolution of spam against static filters.

       -Bill Yerazunis




More information about the Spambayes mailing list