[Spambayes] Two Scheme Enter, One Scheme Leave.
Anthony Baxter
anthony@interlink.com.au
Wed, 25 Sep 2002 18:09:04 +1000
This is on my mungo-corpus, (which, after it's most recent update, is
now 11300 spam and 20200 ham), but selecting only 2000/2000. I chose
a seed of 12346, just to do one better than Tim :)
Robinson defaults as supplied by Tim
(Graham on left)
false positive percentages
0.500 0.500 tied
2.500 3.500 lost +40.00%
1.500 2.500 lost +66.67%
2.500 3.000 lost +20.00%
2.500 3.500 lost +40.00%
1.000 0.500 won -50.00%
0.500 2.500 lost +400.00%
1.000 2.500 lost +150.00%
1.500 2.500 lost +66.67%
2.500 2.500 tied
won 1 times
tied 2 times
lost 7 times
total unique fp went from 32 to 47 lost +46.88%
mean fp % went from 1.6 to 2.35 lost +46.88%
false negative percentages
0.000 0.000 tied
1.500 0.500 won -66.67%
0.500 0.000 won -100.00%
0.500 0.000 won -100.00%
1.000 1.000 tied
0.500 0.500 tied
0.500 0.500 tied
1.000 0.000 won -100.00%
0.000 0.000 tied
0.000 0.500 lost +(was 0)
won 4 times
tied 5 times
lost 1 times
total unique fn went from 11 to 6 won -45.45%
mean fn % went from 0.55 to 0.3 won -45.45%
Raising spam_cutoff to 0.6 (the optimal value for minimum fn+fp) gives us
(Graham on left)
total unique fp went from 32 to 11 won -65.62%
mean fp % went from 1.6 to 0.55 won -65.62%
total unique fn went from 11 to 21 lost +90.91%
mean fn % went from 0.55 to 1.05 lost +90.91%
(default Robinson (spam_cutoff 0.550) on left)
total unique fp went from 47 to 11 won -76.60%
mean fp % went from 2.35 to 0.55 won -76.60%
total unique fn went from 6 to 21 lost +250.00%
mean fn % went from 0.3 to 1.05 lost +250.00%
So let's leave spam_cutoff at 0.6 from now on (rather than trying to
juggle 15 different parameters at once).
Summarising values tried for robinson_probability_a
(using the 0.6 cutoff)
a fp fn fp+fn
0.0 18 650 668
0.001 13 36 49
0.01 13 28 41
0.025 12 24 36
0.05 11 23 34
0.075 10 21 31
0.1 9 21 30
0.125 10 21 31
0.15 10 21 31
0.2 9 22 31
0.25 9 22 31
0.35 10 21 31
0.45 10 22 32
0.5 11 21 33 (tim's default)
1.0 13 29 42
2.0 12 42 54
10.0 11 96 107
I have the raw run data for these, if anyone cares. They're rather large :)
So it looks like a=0.1,cutoff=0.6 is the winning combo of these two.
(Graham on left)
total unique fp went from 32 to 9 won -71.88%
mean fp % went from 1.6 to 0.45 won -71.88%
total unique fn went from 11 to 21 lost +90.91%
mean fn % went from 0.55 to 1.05 lost +90.91%
(default Robinson on left)
total unique fp went from 47 to 9 won -80.85%
mean fp % went from 2.35 to 0.45 won -80.85%
total unique fn went from 6 to 21 lost +250.00%
mean fn % went from 0.3 to 1.05 lost +250.00%
Next up, the other knobs and dials!
Anthony