[Spambayes] Are they learning?

Bill Yerazunis wsy at merl.com
Sun Feb 16 11:11:47 EST 2003

Re: the cloaked Viagra spam: I clipped off all of the headers with any
reference to SpamBayes mailing list, and fed the remainder to CRM114.

As suspected, the phrasing was a dead giveaway even if the keywords
are cloaked.

The actual result was:


   Probabalistic match quality: 1.000000
   P(succ): 1.000000e-00, P(fail):2.070405e-11 
   S hits : 36578, F hits : 33524 

("pass" and "fail" here are inverted due to an unfortunate
historical accident of programming - "pass" in the P-stats means
"matches spam better", while "fail" in the capital letters at the top
_also_ means "matches spam better".  Don't let it worry you.)

Note that even a simple event counter would have gotten this one
right, as the number of corpus text hits was about 10% higher for 
the correct categorization than for the incorrect categorization.

       -Bill Y.

