[Spambayes] Bytes/words ratio

Tim Peters tim.one at comcast.net
Sat Mar 8 00:05:01 EST 2003

> Skip's bytes/words metatoken seems to be a bust.

> I take (mild) exception to that.  It was TimP's idea.  Perhaps I
> implemented it wrong. ;-) Also, note that Tim indicated it helped in his
> early testing.

Nope, I said it was a strong spam indicator, but that it made no difference
to error rates.  That's the same outcome Alex just reported (I didn't see a
asignificant difference in his before-and-after results; no change in FP or
FN, and (just) a few msgs tipped into Unsure).

Another example may help to clarify:  in just about anyone's test data,
"<br>" would be a very strong spam indicator, if the tokenizer produced it.
I expect that adding it into the mix would boost the FP rate, though -- at
least for those of us with sisters <wink>.

