[Spambayes] Mixed combining

T. Alexander Popiel popiel@wolfskeep.com
Sat Oct 19 05:44:50 2002


I did two runs of the mixed combining.  Data is not yet indexed
on my website; perhaps tomorrow.

By my results, mixed spamprob is effectively neutral compared to
straight chi-squared.  The best cost is better, but how to achieve
those costs is no clearer than before.  The fp & fn counts are
lower, but at a cost of about half again more unsures.  I guess
it all depends on how you assign your costs.

Anyway, here's the tables:

Mixed, .9 chi-squared, 0.10-0.90 unsure:
-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams
ham:spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp total:        2       3       3       3       3       2       2
fp %:         0.40    0.40    0.30    0.24    0.20    0.11    0.10
fn total:        5       6       4       5       6       7       9
fn %:         0.25    0.34    0.27    0.40    0.60    0.93    1.80
unsure t:       46      44      45      42      52      51      52
unsure %:     1.84    1.76    1.80    1.68    2.08    2.04    2.08
real cost:  $34.20  $44.80  $43.00  $43.40  $46.40  $37.20  $39.40
best cost:  $28.60  $28.20  $34.00  $33.20  $34.20  $30.40  $23.80
h mean:       3.61    2.70    2.47    2.30    2.29    2.21    1.99
h sdev:       8.09    6.15    6.13    5.93    6.13    5.84    4.79
s mean:      97.08   96.69   96.33   95.84   94.94   94.34   92.25
s sdev:       6.48    7.71    8.63   10.21   12.73   13.67   17.09
mean diff:   93.47   93.99   93.86   93.54   92.65   92.13   90.26
k:            6.42    6.78    6.36    5.80    4.91    4.72    4.13

Mixed, .9 chi-squared, 0.05-0.95 unsure:
-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams
ham:spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp total:        2       2       2       2       2       2       1
fp %:         0.40    0.27    0.20    0.16    0.13    0.11    0.05
fn total:        4       4       3       3       3       3       4
fn %:         0.20    0.23    0.20    0.24    0.30    0.40    0.80
unsure t:       69      71      70      73      83      82      89
unsure %:     2.76    2.84    2.80    2.92    3.32    3.28    3.56
real cost:  $37.80  $38.20  $37.00  $37.60  $39.60  $39.40  $31.80
best cost:  $28.60  $28.20  $34.00  $33.20  $34.20  $30.40  $23.80
h mean:       3.61    2.70    2.47    2.30    2.29    2.21    1.99
h sdev:       8.09    6.15    6.13    5.93    6.13    5.84    4.79
s mean:      97.08   96.69   96.33   95.84   94.94   94.34   92.25
s sdev:       6.48    7.71    8.63   10.21   12.73   13.67   17.09
mean diff:   93.47   93.99   93.86   93.54   92.65   92.13   90.26
k:            6.42    6.78    6.36    5.80    4.91    4.72    4.13

And, for reference, pure chi-squared, 0.05-0.95 unsure:
-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams
ham:spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp total:        2       3       3       3       2       2       2
fp %:         0.40    0.40    0.30    0.24    0.13    0.11    0.10
fn total:        5       6       4       5       6       7       9
fn %:         0.25    0.34    0.27    0.40    0.60    0.93    1.80
unsure t:       49      44      49      46      54      58      53
unsure %:     1.96    1.76    1.96    1.84    2.16    2.32    2.12
real cost:  $34.80  $44.80  $43.80  $44.20  $36.80  $38.60  $39.60
best cost:  $28.60  $28.40  $34.00  $35.60  $34.60  $30.60  $28.60
h mean:       1.31    0.58    0.50    0.46    0.51    0.48    0.36
h sdev:       8.51    6.47    6.46    6.25    6.44    6.12    4.97
s mean:      99.25   98.92   98.60   98.17   97.25   96.73   94.66
s sdev:       6.75    8.05    9.04   10.76   13.47   14.49   18.20
mean diff:   97.94   98.34   98.10   97.71   96.74   96.25   94.30
k:            6.42    6.77    6.33    5.74    4.86    4.67    4.07

Enjoy.

- Alex