> Total emails trained: Spam: 13126 Ham: 2479
> More statistics:
>     SpamBayes has processed 9009 messages - 1566 (17%) good, 
> 7131 (79%) spam and 312 (3%) unsure.
>     648 messages were manually classified as good (0 were 
> false positives).
>     2835 messages were manually classified as spam (1 was a 
> false negative).
>     9 unsure messages were manually identified as good, and 
> 147 as spam.
> (I presume the difference in numbers e.g. (13126 vs. 7131) is due to the 
> fact that I have gone through several different different versions)

Most probably, yes.  Older versions didn't keep around the same information
that newer ones do, so that's probably what's caused this.

> Warning: you have much more spam than ham - SpamBayes works 
> best with approximately even numbers of ham and spam.
> ... which leads me to the one bit of feedback I might tive 
> you.  Given what the world is coming to, 50/50 spam vs. ham 
> appears to me a pipe-dream.  If different tuning of SpamBayes 
> for more unfavorable ratios is possible, perhaps this would 
> be a good idea.

We're definitely aware of this - unfortunately the main attempt we've made
so far at correctly imbalances turned out to hurt more than help.  There
some other ideas about at the moment, though, and this should be something
that's addressed in the first 1.1 version.

> But having said that, even with my 5:1 
> ratio, I am very happy.  What bliss SpamBayes' "best" must be.

:)  How much of an imbalance causes a problem is very much an individual
thing.  The warning appears at 1::5 or 5::1, as this is around where
problems *might* start appearing, and early enough that people can change
their training habits, if possible.  The biggest examples of problems are
when people have ratios more like 100:1 (or worse!).

Thanks for the feedback!

=Tony Meyer

