[Spambayes] RE: Further Improvement 2
Sjoerd Mullender
sjoerd@acm.org
Mon, 23 Sep 2002 16:04:58 +0200
On Sat, Sep 21 2002 Tim Peters wrote:
> Since I got a big win without any effort <wink> by introducing a brand new
> "ignore probs that aren't at least this far from neutral" knob, that's the
> one I'm most inclined to play with right now. There isn't a knob in
> existence that won't be played with, but especially large tests take
> significant wall-clock time to complete, and there's only so much testing
> one can do in a day.
>
> Testers, "a" is already exposed via:
>
> [Classifier]
> robinson_probability_a: 1.0
>
> I think values nearer to 0 are most likely to be most interesting.
Here are my results. The before run has options
"""
[Classifier]
use_robinson_probability: True
use_robinson_combining: True
max_discriminators: 1500
[TestDriver]
spam_cutoff: 0.50
"""
and the after run adds
"""
robinson_probability_a: 0.1
"""
to the set.
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.405 0.540 lost +33.33%
0.270 0.135 won -50.00%
won 1 times
tied 3 times
lost 1 times
total unique fp went from 5 to 5 tied
mean fp % went from 0.134952766532 to 0.134952766532 tied
false negative percentages
1.571 1.571 tied
2.618 2.094 won -20.02%
0.524 1.047 lost +99.81%
1.571 0.000 won -100.00%
2.094 1.571 won -24.98%
won 3 times
tied 1 times
lost 1 times
total unique fn went from 16 to 12 won -25.00%
mean fn % went from 1.67539267016 to 1.25654450262 won -25.00%
And the before histograms:
Ham distribution for all runs:
3705 items; mean 23.12; sample sdev 7.44
* = 12 items
0.00 0
2.50 0
5.00 105 *********
7.50 69 ******
10.00 98 *********
12.50 306 **************************
15.00 177 ***************
17.50 352 ******************************
20.00 497 ******************************************
22.50 701 ***********************************************************
25.00 506 *******************************************
27.50 354 ******************************
30.00 220 *******************
32.50 99 *********
35.00 93 ********
37.50 49 *****
40.00 25 ***
42.50 26 ***
45.00 16 **
47.50 7 *
50.00 1 *
52.50 4 *
55.00 0
Spam distribution for all runs:
955 items; mean 68.33; sample sdev 8.74
* = 2 items
32.50 0
35.00 1 *
37.50 2 *
40.00 0
42.50 0
45.00 4 **
47.50 9 *****
50.00 9 *****
52.50 18 *********
55.00 42 *********************
57.50 73 *************************************
60.00 88 ********************************************
62.50 104 ****************************************************
65.00 111 ********************************************************
67.50 105 *****************************************************
70.00 109 *******************************************************
72.50 53 ***************************
75.00 76 **************************************
77.50 62 *******************************
80.00 37 *******************
82.50 23 ************
85.00 8 ****
87.50 6 ***
90.00 15 ********
92.50 0
95.00 0
97.50 0
And finally the after histograms:
Ham distribution for all runs:
3705 items; mean 21.33; sample sdev 6.94
* = 12 items
0.00 0
2.50 0
5.00 152 *************
7.50 24 **
10.00 197 *****************
12.50 328 ****************************
15.00 261 **********************
17.50 498 ******************************************
20.00 689 **********************************************************
22.50 610 ***************************************************
25.00 393 *********************************
27.50 218 *******************
30.00 117 **********
32.50 79 *******
35.00 71 ******
37.50 25 ***
40.00 11 *
42.50 19 **
45.00 6 *
47.50 2 *
50.00 4 *
52.50 1 *
55.00 0
Spam distribution for all runs:
955 items; mean 72.04; sample sdev 10.31
* = 3 items
35.00 0
37.50 1 *
40.00 1 *
42.50 1 *
45.00 3 *
47.50 6 **
50.00 14 *****
52.50 15 *****
55.00 30 **********
57.50 42 **************
60.00 63 *********************
62.50 61 *********************
65.00 74 *************************
67.50 91 *******************************
70.00 102 **********************************
72.50 123 *****************************************
75.00 69 ***********************
77.50 49 *****************
80.00 45 ***************
82.50 38 *************
85.00 44 ***************
87.50 42 **************
90.00 17 ******
92.50 12 ****
95.00 12 ****
97.50 0
-- Sjoerd Mullender <sjoerd@acm.org>