[Spambayes] Use for gray area in scoring range

Guido van Rossum guido@python.org
Sun, 22 Sep 2002 02:16:01 -0400


> That's mostly because it *has* a "middle ground"

I was aware of that when I first said I didn't want a "maybe" scoring
category; what made me change was trying to tune the knob.

> > Specifying a gray area gives you a useful tool to see if you have
> > set your cutoff right.
> 
> The practical difficulty here is that the gray area we're observing
> is very narrow: the difference between setting the cutoff at 0.5 or
> 0.575 in my large test was the difference between seeming disaster
> and "does just as well as our Graham-like scheme".  Move it to 0.60,
> and it heads back to disasterland again.

I haven't found the right setting for me yet.  0.575 did better than
0.55 but still much worse on the fps than Graham.

> It's great to have the knob, but it's sensitive, and so far we've no
> idea how to choose it short of trial and error (it's easy to choose
> if you've got the score histograms to stare at, but end users
> won't).

Plus, we don't know why it's not 50, right?

Might that have to do with the spam/ham ration?  I've got 83 hams for
each 32 spams.

--Guido van Rossum (home page: http://www.python.org/~guido/)