[Spambayes] Need testers!

Tim Peters tim.one@comcast.net
Sun, 15 Sep 2002 03:45:47 -0400


[Tim]
> 1. Run a baseline and save a summary file (rates.py).
>
> 2. Make exactly one change, adding
>
> [Classifier]
> adjust_probs_by_evidence_mass: True
>
>    to your bayescustomize.ini file.
>
> 3. Run the same test scenario again, and create another summary file.
>
> 4. Run cmp.py over the summary files and post the cmp.py output
>    (all of it, please).  Or mail it to me, but I think there's value
>    in public humiliation <wink> if it helps get better results.

This worked well for my very large training sets, but was a clear loss when
I ran on smaller subsets.  I've checked in another version that repaired
this for me.  To try it, do cvs up and change step #2 to this:

[Classifier]
adjust_probs_by_evidence_mass: True
min_spamprob: 0.001
max_spamprob: 0.999
hambias: 1.5


On my giant

-> <stat> tested 2000 hams & 1375 spams against 18000 hams & 12375 spams

10-fold c-v run, this is the difference:

false positive percentages
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.050  0.050  tied
    0.000  0.050  lost  +(was 0)
    0.000  0.000  tied
    0.050  0.050  tied
    0.000  0.000  tied
    0.100  0.050  won    -50.00%

won   1 times
tied  8 times
lost  1 times

total unique fp went from 4 to 4 tied
mean fp % went from 0.02 to 0.02 tied

false negative percentages
    0.218  0.145  won    -33.49%
    0.364  0.364  tied
    0.000  0.073  lost  +(was 0)
    0.218  0.218  tied
    0.218  0.218  tied
    0.291  0.145  won    -50.17%
    0.218  0.073  won    -66.51%
    0.145  0.145  tied
    0.291  0.218  won    -25.09%
    0.073  0.000  won   -100.00%

won   5 times
tied  4 times
lost  1 times

total unique fn went from 28 to 22 won    -21.43%
mean fn % went from 0.203636363636 to 0.16 won    -21.43%


On much smaller random-subset 10-fold c-v runs with

-> <stat> tested 300 hams & 300 spams against 2700 hams & 2700 spams

the effect is usually a significant decrease in the f-n rate, and a similar
(to the huge test) random bump up or down in the f-p stats.  Dropping the
value of hambias probably accounts for the f-n goodness; the rest is largely
to prevent f-p badness at the same time; expanding the min/max-prob range,
coupled with taking into account the number of msgs that go into each
probablity estimate, all but eliminates "massive cancellation" of
MIN_SPAMPROB and MAX_SPAMPROB clues.