[Spambayes] chi-squared versus "prob strength"

Anthony Baxter anthony@interlink.com.au
Mon, 14 Oct 2002 16:31:31 +1000


>>> Tim Peters wrote
> I was, but more importantly my test data agreed, so I'm going to switch to
> this (the evidence is so consistent and solid on both our datasets that
> making it an option would supply a pointless choice -- losers are killed).
> Good show!

Here's what my mungo-test set shows for this (before is pre-Rob Hooft's
change, after is current CVS)

chi2s.txt -> chi2as.txt
-> <stat> tested 3490 hams & 1687 spams against 31410 hams & 15161 spams
-> <stat> tested 3490 hams & 1682 spams against 31410 hams & 15166 spams
-> <stat> tested 3490 hams & 1688 spams against 31410 hams & 15160 spams
-> <stat> tested 3490 hams & 1679 spams against 31410 hams & 15169 spams
-> <stat> tested 3490 hams & 1686 spams against 31410 hams & 15162 spams
-> <stat> tested 3490 hams & 1688 spams against 31410 hams & 15160 spams
-> <stat> tested 3490 hams & 1678 spams against 31410 hams & 15170 spams
-> <stat> tested 3490 hams & 1688 spams against 31410 hams & 15160 spams
-> <stat> tested 3490 hams & 1683 spams against 31410 hams & 15165 spams
-> <stat> tested 3490 hams & 1689 spams against 31410 hams & 15159 spams
-> <stat> tested 3490 hams & 1687 spams against 31410 hams & 15161 spams
-> <stat> tested 3490 hams & 1682 spams against 31410 hams & 15166 spams
-> <stat> tested 3490 hams & 1688 spams against 31410 hams & 15160 spams
-> <stat> tested 3490 hams & 1679 spams against 31410 hams & 15169 spams
-> <stat> tested 3490 hams & 1686 spams against 31410 hams & 15162 spams
-> <stat> tested 3490 hams & 1688 spams against 31410 hams & 15160 spams
-> <stat> tested 3490 hams & 1678 spams against 31410 hams & 15170 spams
-> <stat> tested 3490 hams & 1688 spams against 31410 hams & 15160 spams
-> <stat> tested 3490 hams & 1683 spams against 31410 hams & 15165 spams
-> <stat> tested 3490 hams & 1689 spams against 31410 hams & 15159 spams

false positive percentages
    0.946  0.974  lost    +2.96%
    0.917  0.917  tied          
    0.802  0.831  lost    +3.62%
    0.659  0.860  lost   +30.50%
    0.573  0.659  lost   +15.01%
    0.802  0.831  lost    +3.62%
    0.716  0.745  lost    +4.05%
    0.516  0.544  lost    +5.43%
    0.630  0.688  lost    +9.21%
    0.917  1.003  lost    +9.38%

won   0 times
tied  1 times
lost  9 times

total unique fp went from 261 to 281 lost    +7.66%
mean fp % went from 0.747851002865 to 0.805157593123 lost    +7.66%

false negative percentages
    0.356  0.296  won    -16.85%
    0.119  0.059  won    -50.42%
    0.237  0.237  tied          
    0.476  0.476  tied          
    0.297  0.237  won    -20.20%
    0.415  0.415  tied          
    0.596  0.477  won    -19.97%
    0.296  0.237  won    -19.93%
    0.416  0.416  tied          
    0.355  0.296  won    -16.62%

won   6 times
tied  4 times
lost  0 times

total unique fn went from 60 to 53 won    -11.67%
mean fn % went from 0.356257958499 to 0.314689990048 won    -11.67%

ham mean                     ham sdev
   3.46    3.24   -6.36%       12.12   11.96   -1.32%
   3.01    2.85   -5.32%       11.48   11.39   -0.78%
   3.28    3.01   -8.23%       11.45   11.22   -2.01%
   3.23    3.02   -6.50%       11.43   11.27   -1.40%
   3.15    2.88   -8.57%       10.65   10.37   -2.63%
   3.17    2.95   -6.94%       11.30   11.07   -2.04%
   3.27    3.02   -7.65%       11.29   10.94   -3.10%
   3.06    2.82   -7.84%       10.51   10.20   -2.95%
   3.32    3.13   -5.72%       11.37   11.18   -1.67%
   3.45    3.21   -6.96%       11.75   11.59   -1.36%

ham mean and sdev for all runs
   3.24    3.01   -7.10%       11.34   11.13   -1.85%

spam mean                    spam sdev
  99.75   99.76   +0.01%        3.91    3.85   -1.53%
  99.90   99.91   +0.01%        1.62    1.38  -14.81%
  99.81   99.82   +0.01%        3.09    3.05   -1.29%
  99.60   99.62   +0.02%        4.92    4.80   -2.44%
  99.78   99.78   +0.00%        3.24    3.36   +3.70%
  99.78   99.78   +0.00%        3.04    3.14   +3.29%
  99.62   99.62   +0.00%        4.73    4.78   +1.06%
  99.79   99.81   +0.02%        2.75    2.66   -3.27%
  99.66   99.66   +0.00%        4.47    4.62   +3.36%
  99.70   99.70   +0.00%        4.37    4.32   -1.14%

spam mean and sdev for all runs
  99.74   99.75   +0.01%        3.75    3.75   +0.00%

ham/spam mean difference: 96.50 96.74 +0.24

Here's the histograms from the 'after' case:

-> <stat> Ham scores for all runs: 34900 items; mean 3.01; sdev 11.13
-> <stat> min -9.99201e-14; median 0.000498415; max 100
* = 448 items
 0.0 27319 *************************************************************
 0.5  1129 ***
 1.0   695 **
 1.5   507 **
 2.0   412 *
 2.5   320 *
 3.0   269 *
 3.5   241 *
 4.0   194 *
 4.5   178 *
 5.0   151 *
 5.5   114 *
 6.0   131 *
 6.5   129 *
 7.0   106 *
 7.5   104 *
 8.0   103 *
 8.5    84 *
 9.0    76 *
 9.5    85 *
10.0    65 *
10.5    60 *
11.0    73 *
11.5    54 *
12.0    63 *
12.5    50 *
13.0    59 *
13.5    51 *
14.0    65 *
14.5    43 *
15.0    31 *
15.5    50 *
16.0    40 *
16.5    38 *
17.0    39 *
17.5    37 *
18.0    27 *
18.5    31 *
19.0    40 *
19.5    31 *
20.0    41 *
20.5    27 *
21.0    27 *
21.5    29 *
22.0    26 *
22.5    34 *
23.0    23 *
23.5    26 *
24.0    31 *
24.5    23 *
25.0    12 *
25.5    15 *
26.0    16 *
26.5    27 *
27.0    27 *
27.5    27 *
28.0    18 *
28.5    25 *
29.0    16 *
29.5    19 *
30.0    19 *
30.5    17 *
31.0    14 *
31.5    18 *
32.0    16 *
32.5    12 *
33.0    29 *
33.5    19 *
34.0     6 *
34.5    15 *
35.0    14 *
35.5    15 *
36.0    19 *
36.5    11 *
37.0     9 *
37.5    12 *
38.0    13 *
38.5    10 *
39.0    12 *
39.5    15 *
40.0    13 *
40.5    12 *
41.0     9 *
41.5    14 *
42.0    14 *
42.5    13 *
43.0    21 *
43.5    16 *
44.0    11 *
44.5     7 *
45.0    10 *
45.5     8 *
46.0     9 *
46.5    10 *
47.0     9 *
47.5     9 *
48.0     9 *
48.5    10 *
49.0    12 *
49.5    20 *
50.0    31 *
50.5     8 *
51.0    12 *
51.5     6 *
52.0    10 *
52.5     8 *
53.0    10 *
53.5     3 *
54.0     9 *
54.5     5 *
55.0    16 *
55.5    14 *
56.0     6 *
56.5     7 *
57.0    10 *
57.5     8 *
58.0     6 *
58.5     7 *
59.0    11 *
59.5     3 *
60.0     5 *
60.5     9 *
61.0     3 *
61.5     5 *
62.0     5 *
62.5     5 *
63.0     5 *
63.5     9 *
64.0    10 *
64.5     8 *
65.0     5 *
65.5     7 *
66.0     7 *
66.5     3 *
67.0     3 *
67.5     5 *
68.0     7 *
68.5     3 *
69.0     5 *
69.5     6 *
70.0     6 *
70.5     3 *
71.0     2 *
71.5     5 *
72.0     5 *
72.5     1 *
73.0     1 *
73.5     6 *
74.0     2 *
74.5     8 *
75.0     5 *
75.5     5 *
76.0     5 *
76.5     7 *
77.0     5 *
77.5     3 *
78.0     4 *
78.5     4 *
79.0     2 *
79.5     2 *
80.0     2 *
80.5     4 *
81.0     7 *
81.5     4 *
82.0     6 *
82.5     5 *
83.0     1 *
83.5     5 *
84.0     4 *
84.5     2 *
85.0     4 *
85.5     4 *
86.0     2 *
86.5     1 *
87.0     8 *
87.5     6 *
88.0     3 *
88.5     5 *
89.0     2 *
89.5     3 *
90.0     0 
90.5     0 
91.0     1 *
91.5     3 *
92.0     1 *
92.5     3 *
93.0     5 *
93.5     5 *
94.0     5 *
94.5     3 *
95.0     8 *
95.5     4 *
96.0     1 *
96.5     3 *
97.0     5 *
97.5     4 *
98.0     5 *
98.5     8 *
99.0     8 *
99.5    50 *

-> <stat> Spam scores for all runs: 16848 items; mean 99.75; sdev 3.75
-> <stat> min 0.00333927; median 100; max 100
* = 273 items
 0.0     1 *
 0.5     1 *
 1.0     1 *
 1.5     0 
 2.0     0 
 2.5     1 *
 3.0     1 *
 3.5     0 
 4.0     0 
 4.5     0 
 5.0     1 *
 5.5     0 
 6.0     0 
 6.5     0 
 7.0     0 
 7.5     0 
 8.0     1 *
 8.5     0 
 9.0     1 *
 9.5     0 
10.0     0 
10.5     0 
11.0     0 
11.5     0 
12.0     1 *
12.5     0 
13.0     0 
13.5     0 
14.0     2 *
14.5     0 
15.0     0 
15.5     0 
16.0     0 
16.5     0 
17.0     1 *
17.5     2 *
18.0     0 
18.5     0 
19.0     0 
19.5     0 
20.0     0 
20.5     0 
21.0     1 *
21.5     1 *
22.0     0 
22.5     0 
23.0     0 
23.5     0 
24.0     0 
24.5     2 *
25.0     0 
25.5     1 *
26.0     1 *
26.5     0 
27.0     0 
27.5     0 
28.0     0 
28.5     0 
29.0     0 
29.5     0 
30.0     0 
30.5     0 
31.0     0 
31.5     0 
32.0     0 
32.5     0 
33.0     0 
33.5     0 
34.0     0 
34.5     0 
35.0     0 
35.5     0 
36.0     0 
36.5     0 
37.0     0 
37.5     1 *
38.0     1 *
38.5     0 
39.0     0 
39.5     0 
40.0     0 
40.5     0 
41.0     0 
41.5     1 *
42.0     0 
42.5     1 *
43.0     0 
43.5     1 *
44.0     0 
44.5     0 
45.0     0 
45.5     0 
46.0     0 
46.5     1 *
47.0     0 
47.5     0 
48.0     0 
48.5     1 *
49.0     0 
49.5     2 *
50.0     3 *
50.5     2 *
51.0     0 
51.5     2 *
52.0     0 
52.5     1 *
53.0     0 
53.5     0 
54.0     1 *
54.5     0 
55.0     0 
55.5     0 
56.0     0 
56.5     2 *
57.0     1 *
57.5     1 *
58.0     0 
58.5     1 *
59.0     0 
59.5     0 
60.0     1 *
60.5     0 
61.0     0 
61.5     1 *
62.0     0 
62.5     0 
63.0     0 
63.5     1 *
64.0     0 
64.5     1 *
65.0     0 
65.5     2 *
66.0     1 *
66.5     0 
67.0     0 
67.5     0 
68.0     1 *
68.5     0 
69.0     1 *
69.5     1 *
70.0     0 
70.5     1 *
71.0     0 
71.5     1 *
72.0     0 
72.5     3 *
73.0     0 
73.5     0 
74.0     1 *
74.5     0 
75.0     1 *
75.5     1 *
76.0     0 
76.5     1 *
77.0     1 *
77.5     0 
78.0     6 *
78.5     0 
79.0     1 *
79.5     0 
80.0     1 *
80.5     0 
81.0     1 *
81.5     1 *
82.0     1 *
82.5     1 *
83.0     0 
83.5     0 
84.0     0 
84.5     0 
85.0     3 *
85.5     1 *
86.0     2 *
86.5     2 *
87.0     2 *
87.5     0 
88.0     2 *
88.5     0 
89.0     0 
89.5     2 *
90.0     0 
90.5     1 *
91.0     0 
91.5     0 
92.0     4 *
92.5     5 *
93.0     2 *
93.5     1 *
94.0     2 *
94.5     4 *
95.0     2 *
95.5     8 *
96.0     3 *
96.5     5 *
97.0     9 *
97.5    10 *
98.0     9 *
98.5    22 *
99.0    44 *
99.5 16628 *************************************************************
-> best cutoff for all runs: 0.995
->     with weighted total 10*50 fp + 220 fn = 720
->     fp rate 0.143%  fn rate 1.31%