[Spambayes] Bytes/words ratio
tim.one at comcast.net
Sat Mar 8 00:05:01 EST 2003
> Skip's bytes/words metatoken seems to be a bust.
> I take (mild) exception to that. It was TimP's idea. Perhaps I
> implemented it wrong. ;-) Also, note that Tim indicated it helped in his
> early testing.
Nope, I said it was a strong spam indicator, but that it made no difference
to error rates. That's the same outcome Alex just reported (I didn't see a
asignificant difference in his before-and-after results; no change in FP or
FN, and (just) a few msgs tipped into Unsure).
Another example may help to clarify: in just about anyone's test data,
"<br>" would be a very strong spam indicator, if the tokenizer produced it.
I expect that adding it into the mix would boost the FP rate, though -- at
least for those of us with sisters <wink>.
More information about the Spambayes