[Spambayes] spambayes fronting a mailing list?

Tim Peters tim.one at comcast.net
Sun Jan 19 00:27:10 EST 2003

> Doesn't it take time before the first spam arrives on a brand
> new mailinglist? Spambayes' results are going to be real
> lousy if it is trained on 200 ham and 0 spam messages....

> It might be, but how will that lousiness manifest?

It depends on a lot on whether you enable the bool
experimental_ham_spam_imbalance_adjustment option.  It it's true, and you
have no spam, every msg will score exactly 0.5.

> As false negatives?

If experimental_ham_spam_imbalance_adjustment is false (still the default,
since I haven't touched the code since the option was introduced), yes.
Every word in the database will be associated with ham, so nothing is
evidence for spam.

> If so, the -spam reporting address for the list should
> eventually warm up the spam side, right?

Yes it will.  It's best to shoot for the same # of ham and spam, if for no
other reason than that then experimental_ham_spam_imbalance_adjustment has
no effect either way <0.6 wink>.

> Depending on how much your legitimate list traffic looks like spam
> already, it might warm up pretty quickly.

It won't look like spam.  Even if it "looks like spam" to human eyes, the
classifier will find many strong differences, some of which people will
never think of.  Hell, some differences people will even argue about, but
it's futile -- real-life data doesn't lie about real life <wink>.

