[Spambayes] Obvious Spam Missed...

Tim Peters tim.one at comcast.net
Tue Sep 16 16:11:44 EDT 2003

[Ryan Malayter]
> Tim, can you quantify "a large imbalnace"? Is a 2:1 ratio significant
> enough to cause problems,

Probably not.

> or would it have to be something on the order of 10:1?

The larger the worserer <wink>.  We've seen people with imbalance worse than
300:1 or 1:300 (different people get extreme in different ways).

> I'm getting a 98% capture rate after training with a 2:1 ham-to-spam
> ratio, but other folks seem to be having troubles.

Other folks are having troubles, but in each such case to date where they've
stuck around long enough to answer the questions, it turns out they've had
extreme imbalance in their training data.  Of those, a few less have stuck
around long enough to report back that disabling the
experimental_ham_spam_imbalance option helped.

That's all I've got to go on, and over on spambayes-dev we're talking about
changing this option as a result.  Do you happen to know the training ratios
for the "other folks [with] troubles" you know about?

