[Spambayes] CRM114 in November breaks 99.9%. :-)

Matt Sergeant msergeant@startechgroup.co.uk
Mon Dec 2 15:22:23 2002


Bill Yerazunis said the following on 02/12/02 14:44:
> Final test statistics for CRM114 for November are in:
> 
> Standard rules apply (no whitelists, no blacklists, realtime email stream
> only (no "canned spam"), train only on errors, polynomial length 5)
> 
> 	 For All of November (starting 9 AM Nov 1, ending 9 AM Dec 1)
> 
>     Spams  Nonspams   False     False    Total    N+1 Accuracy       NHC's
>                      Accepts   Rejects   Emails  
>       1993   3914       4         0       5911     99.915             2
> 
>     Spam features in hash tables:    398K
>     Nonspam features in hash tables: 299K

CRM114's learn and classify stuff looks really interesting, but it has a 
really freaky syntax to someone who is used to regular procedural or OO 
languages like Perl, Python, C, etc. Is there *any* chance the library 
in crm114 for learning and classifying can be extracted into a plain 
.so? That would be tremendous, and I'd be willing to build a perl XS 
library for it in a heartbeat.

If not, we'll just have to try and copy the sparse binary polynomial 
hash idea ;-)





More information about the Spambayes mailing list