[Spambayes] There Can Be Only One

Tim Peters tim.one@comcast.net
Wed, 25 Sep 2002 21:38:45 -0400


One more run, starting over from scratch with a new seed, but setting

robinson_probability_a: 0.225

Graham vs that:

grahams -> fofwbases
-> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
   ...

false positive percentages
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.500  0.500  tied
    0.000  0.500  lost  +(was 0)
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.500  0.500  tied

won   0 times
tied  9 times
lost  1 times

total unique fp went from 2 to 3 lost   +50.00%
mean fp % went from 0.1 to 0.15 lost   +50.00%

false negative percentages
    0.500  0.000  won   -100.00%
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    1.000  0.000  won   -100.00%
    0.500  0.000  won   -100.00%

won   3 times
tied  7 times
lost  0 times

total unique fn went from 4 to 0 won   -100.00%
mean fn % went from 0.2 to 0.0 won   -100.00%

Boosting cutoff to 0.56 (it was 0.55) would have been a pure win, chopping
one fp.  Boosting it to 0.585 would have lost another fp, but gained 3 fn.
That would have left it with 1 fp and 3 fn, vs 2 and 4 for Graham.

It's not really fair to keep fiddling spam_cutoff after the fact, so it
remains disturbing that "one size fits all" doesn't here.  OTOH, the range
across which people fiddle spam_cutoff spans about 0.10, and relatively few
messages live there, so if you believe in middle grounds, this has a useful
one.

Does anyone else intend to participate in this death match?  If not, I'll
tally results tomorrow.  My general *impression* (subject to change) is that
Gary's scheme is doing at least as well, and often slightly better.  At
least two people have said (offline) that they're more comfortable with the
false positive mistakes it makes -- that they make "more sense" than the
ones the Graham scheme produces.  I haven't experienced that in this
particular test; e.g., the two fp under Graham in the run above were also fp
under Gary's scheme.  In the first run here I reported, both had a single
fp, and again it was the same one.  It seems that very brief msgs are hard
to score for any scheme (but my hope remains that mining more header lines
will help that -- we may even want to *presume* that a very brief msg is
ham, in the absence of strong evidence to the contrary?).