[Spambayes] RE: spam detection via probability - actual results!

Tim Peters tim.one@comcast.net
Thu, 19 Sep 2002 23:57:21 -0400


BTW, in those last score histograms it's clear that the ham distribution has
lower variance than the spam distribution.  I've seen that before in a
different experiment that produced (what looked to be) normally distributed
ham and spam scores.  In that case, the bulk of the difference was explained
by that we still have hambias set to 2 (we still artificially double the
count on ham words).  When I tried that older experiment again after
removing the hambias (which is the only deliberate bias we haven't yet
removed from Graham's original scheme), it was the spam distribution that
turned out to be tighter.  I expect, but don't know, that the same would be
true here.