[Spambayes] RE: solution for the "spam of the future"?
kennypitt at hotmail.com
Tue Dec 16 17:27:05 EST 2003
Coe, Bob wrote:
> Don't start generating the "Missing: N" token until the database is
> large enough for it to make sense.
If this works at all, it also seems like the *percentage* of unknown
word tokens in the message would work better than a log()'d count. A
very large newsletter is pretty much guaranteed to have a higher *count*
of unknown tokens than a short mailing list message, but that's because
it has more total tokens and not because it's any spammier.
More information about the Spambayes