[Spambayes] There Can Be Only One
Skip Montanaro
skip@pobox.com
Wed, 25 Sep 2002 13:54:08 -0500
An now, coming in from faaaar out in right field we have Skip, the weirdo
with the wacky results:
grahams -> fws
-> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
...
false positive percentages
0.000 0.000 tied
0.000 0.500 lost +(was 0)
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.500 0.500 tied
0.000 0.000 tied
0.500 0.000 won -100.00%
0.000 0.000 tied
won 1 times
tied 8 times
lost 1 times
total unique fp went from 2 to 2 tied
mean fp % went from 0.1 to 0.1 tied
false negative percentages
9.000 10.500 lost +16.67%
11.000 14.000 lost +27.27%
10.000 12.000 lost +20.00%
7.000 8.500 lost +21.43%
14.000 19.500 lost +39.29%
12.000 12.000 tied
13.000 17.000 lost +30.77%
9.000 12.500 lost +38.89%
9.000 13.000 lost +44.44%
9.500 12.000 lost +26.32%
won 0 times
tied 1 times
lost 9 times
total unique fn went from 207 to 262 lost +26.57%
mean fn % went from 10.35 to 13.1 lost +26.57%
ham mean ham sdev
0.00 21.97 +(was 0) 0.00 6.82 +(was 0)
0.58 23.40 +3934.48% 5.62 8.78 +56.23%
0.00 22.83 +(was 0) 0.00 7.58 +(was 0)
0.00 22.52 +(was 0) 0.00 7.65 +(was 0)
0.00 22.96 +(was 0) 0.00 7.78 +(was 0)
0.03 22.27 +74133.33% 0.42 7.70 +1733.33%
0.50 22.79 +4458.00% 7.07 7.67 +8.49%
0.00 22.77 +(was 0) 0.00 6.88 +(was 0)
0.50 22.44 +4388.00% 7.07 7.61 +7.64%
0.00 22.76 +(was 0) 0.00 7.13 +(was 0)
ham mean and sdev for all runs
0.16 22.67 +14068.75% 3.63 7.57 +108.54%
spam mean spam sdev
91.02 69.87 -23.24% 28.49 12.08 -57.60%
89.95 67.92 -24.49% 29.71 12.32 -58.53%
90.62 69.54 -23.26% 28.80 11.87 -58.78%
93.70 71.21 -24.00% 23.90 10.83 -54.69%
86.81 66.41 -23.50% 33.55 13.55 -59.61%
88.40 69.92 -20.90% 31.84 12.23 -61.59%
88.89 68.79 -22.61% 30.41 13.18 -56.66%
91.15 69.36 -23.91% 28.24 12.28 -56.52%
91.38 69.29 -24.17% 27.94 12.55 -55.08%
90.73 70.44 -22.36% 28.83 12.34 -57.20%
spam mean and sdev for all runs
90.27 69.27 -23.26% 29.26 12.38 -57.69%
ham/spam mean difference: 90.11 46.60 -43.51
Here are the overall graphs from the f(w) run:
-> <stat> Ham scores for all runs: 2000 items; mean 22.67; sample sdev 7.57
* = 5 items
0.00 5 *
2.50 6 **
5.00 16 ****
7.50 48 **********
10.00 80 ****************
12.50 121 *************************
15.00 195 ***************************************
17.50 277 ********************************************************
20.00 278 ********************************************************
22.50 260 ****************************************************
25.00 257 ****************************************************
27.50 168 **********************************
30.00 125 *************************
32.50 55 ***********
35.00 40 ********
37.50 20 ****
40.00 18 ****
42.50 12 ***
45.00 7 **
47.50 4 *
50.00 1 *
52.50 5 *
55.00 0
57.50 0
60.00 2 *
62.50 0
65.00 0
67.50 0
70.00 0
72.50 0
75.00 0
77.50 0
80.00 0
82.50 0
85.00 0
87.50 0
90.00 0
92.50 0
95.00 0
97.50 0
-> <stat> Spam scores for all runs: 2000 items; mean 69.27; sample sdev 12.38
* = 3 items
0.00 0
2.50 0
5.00 0
7.50 0
10.00 0
12.50 0
15.00 0
17.50 0
20.00 1 *
22.50 0
25.00 1 *
27.50 2 *
30.00 1 *
32.50 6 **
35.00 4 **
37.50 10 ****
40.00 23 ********
42.50 20 *******
45.00 37 *************
47.50 41 **************
50.00 60 ********************
52.50 56 *******************
55.00 70 ************************
57.50 107 ************************************
60.00 124 ******************************************
62.50 121 *****************************************
65.00 151 ***************************************************
67.50 163 *******************************************************
70.00 166 ********************************************************
72.50 169 *********************************************************
75.00 139 ***********************************************
77.50 126 ******************************************
80.00 102 **********************************
82.50 95 ********************************
85.00 79 ***************************
87.50 53 ******************
90.00 46 ****************
92.50 23 ********
95.00 4 **
97.50 0
It looks to me like the problem (whatever it is) with my data is that it
causes the standard deviation to get quite large, though the mean spam score
seems to be a bit lower than other stuff I've seen. What would cause that?
More variability in my spam?
Skip