[Spambayes] Need testers!

Tim Peters tim.one@comcast.net
Sun, 15 Sep 2002 18:30:54 -0400


[Neil Schemenauer]
> The results for 6 sets with 300 in each:

Thanks!  Was this the earlier-suggested

    [Classifier]
    adjust_probs_by_evidence_mass: True

experiment, or the later-suggested

    [Classifier]
    adjust_probs_by_evidence_mass: True
    min_spamprob: 0.001
    max_spamprob: 0.999
    hambias: 1.5

experiment?

> false positive percentages
>     0.667  1.333  lost   +99.85%
>     0.000  0.000  tied
>     1.000  1.000  tied
>     0.333  0.333  tied
>     0.000  0.667  lost  +(was 0)
>     0.000  0.000  tied
>
> won   0 times
> tied  4 times
> lost  2 times
>
> total unique fp went from 6 to 10 lost   +66.67%
> mean fp % went from 0.333333333333 to 0.555555555555 lost   +66.67%

Do you care?  That is, are the new ones false positives in a disturbing
sense?  There are about 10 msgs in my 20,000 ham that I don't care about --
if they flip from f-p to f-n or back again, I don't give that any weight.
Of course this includes the quote of the Nigerian scam msg, but also
includes two-word "unsubscribe me" messages from first-time posters wrapped
in 5KB of HTML decorations.  I keep these things in the ham set because
they're simply not spam, but nobody would care if they got squashed.

> false negative percentages
>     0.333  0.333  tied
>     1.333  1.000  won    -24.98%
>     1.667  1.333  won    -20.04%
>     0.333  0.333  tied
>     1.333  1.333  tied
>     1.667  0.667  won    -59.99%
>
> won   3 times
> tied  3 times
> lost  0 times
>
> total unique fn went from 20 to 15 won    -25.00%
> mean fn % went from 1.11111111111 to 0.833333333332 won    -25.00%

Cool!