[Spambayes] Are they learning?
wsy at merl.com
Sun Feb 16 11:11:47 EST 2003
Re: the cloaked Viagra spam: I clipped off all of the headers with any
reference to SpamBayes mailing list, and fed the remainder to CRM114.
As suspected, the phrasing was a dead giveaway even if the keywords
The actual result was:
**CRM114 FAIL SBPH/BCR TEST**
Probabalistic match quality: 1.000000
P(succ): 1.000000e-00, P(fail):2.070405e-11
S hits : 36578, F hits : 33524
("pass" and "fail" here are inverted due to an unfortunate
historical accident of programming - "pass" in the P-stats means
"matches spam better", while "fail" in the capital letters at the top
_also_ means "matches spam better". Don't let it worry you.)
Note that even a simple event counter would have gotten this one
right, as the number of corpus text hits was about 10% higher for
the correct categorization than for the incorrect categorization.
More information about the Spambayes