[Spambayes] Moving closer to Gary's ideal
Sjoerd Mullender
sjoerd@acm.org
Mon, 23 Sep 2002 10:32:21 +0200
On Sat, Sep 21 2002 Tim Peters wrote:
> """
> [Classifier]
> use_robinson_probability: True
> use_robinson_combining: True
> max_discriminators: 1500
>
> [TestDriver]
> spam_cutoff: 0.50
> """
I tested this against the default options (except I have
count_all_header_lines: True and mine_received_headers: True
permanently) and got these results:
false positive percentages
0.524 1.047 lost +99.81%
0.000 0.524 lost +(was 0)
0.524 0.524 tied
0.524 1.047 lost +99.81%
0.524 1.571 lost +199.81%
won 0 times
tied 1 times
lost 4 times
total unique fp went from 4 to 9 lost +125.00%
mean fp % went from 0.418848167539 to 0.942408376964 lost +125.00%
false negative percentages
1.571 0.000 won -100.00%
2.618 2.094 won -20.02%
1.571 0.524 won -66.65%
0.524 0.524 tied
1.571 1.047 won -33.35%
won 4 times
tied 1 times
lost 0 times
total unique fn went from 15 to 8 won -46.67%
mean fn % went from 1.57068062827 to 0.83769633508 won -46.67%
The histograms in the default scheme show the usual pattern, but the
histograms with the changed parameters is like this:
Ham distribution for all runs:
955 items; mean 26.28; sample sdev 8.12
* = 3 items
0.00 0
2.50 0
5.00 0
7.50 40 **************
10.00 0
12.50 61 *********************
15.00 27 *********
17.50 47 ****************
20.00 96 ********************************
22.50 127 *******************************************
25.00 155 ****************************************************
27.50 127 *******************************************
30.00 96 ********************************
32.50 65 **********************
35.00 44 ***************
37.50 24 ********
40.00 13 *****
42.50 13 *****
45.00 5 **
47.50 6 **
50.00 8 ***
52.50 1 *
55.00 0
Spam distribution for all runs:
955 items; mean 68.60; sample sdev 8.43
* = 2 items
32.50 0
35.00 1 *
37.50 2 *
40.00 0
42.50 0
45.00 3 **
47.50 2 *
50.00 10 *****
52.50 15 ********
55.00 31 ****************
57.50 70 ***********************************
60.00 93 ***********************************************
62.50 117 ***********************************************************
65.00 109 *******************************************************
67.50 117 ***********************************************************
70.00 103 ****************************************************
72.50 58 *****************************
75.00 78 ***************************************
77.50 59 ******************************
80.00 34 *****************
82.50 24 ************
85.00 8 ****
87.50 6 ***
90.00 15 ********
92.50 0
95.00 0
97.50 0
-- Sjoerd Mullender <sjoerd@acm.org>