[Spambayes] z-combining
T. Alexander Popiel
popiel@wolfskeep.com
Mon, 14 Oct 2002 14:53:06 -0700
Well, I did a z-combining run. @whee. It replaces my
all-defaults run as cv1. chi-square remains as cv2.
>From results.txt:
"""
ham mean ham sdev
0.50 0.50 +0.00% 7.05 7.04 -0.14%
0.26 0.27 +3.85% 3.65 3.71 +1.64%
0.02 0.04 +100.00% 0.29 0.41 +41.38%
0.49 0.41 -16.33% 5.44 4.13 -24.08%
0.38 0.36 -5.26% 5.27 4.84 -8.16%
1.03 1.01 -1.94% 9.88 9.42 -4.66%
0.51 0.51 +0.00% 5.56 5.47 -1.62%
0.09 0.16 +77.78% 1.26 1.94 +53.97%
0.97 0.95 -2.06% 9.66 9.40 -2.69%
0.12 0.14 +16.67% 1.73 1.88 +8.67%
ham mean and sdev for all runs
0.44 0.44 +0.00% 5.90 5.65 -4.24%
spam mean spam sdev
98.68 98.42 -0.26% 10.66 10.85 +1.78%
99.31 99.26 -0.05% 5.62 5.56 -1.07%
97.68 97.82 +0.14% 13.94 12.18 -12.63%
98.84 98.85 +0.01% 9.00 8.90 -1.11%
98.54 98.55 +0.01% 11.71 9.65 -17.59%
97.99 98.31 +0.33% 13.48 11.21 -16.84%
96.88 97.25 +0.38% 15.83 13.12 -17.12%
99.34 98.98 -0.36% 4.95 6.15 +24.24%
98.07 98.26 +0.19% 11.74 10.37 -11.67%
99.65 99.01 -0.64% 3.04 5.46 +79.61%
spam mean and sdev for all runs
98.50 98.47 -0.03% 10.81 9.72 -10.08%
ham/spam mean difference: 98.06 98.03 -0.03
"""
z-combining loses vs. chi-square there, with looser sdevs.
Next, we have the best computations for z-combining:
"""
-> best cost $54.20
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at 6 cutoff pairs
-> smallest ham & spam cutoffs 0.01 & 0.985
-> fp 3; fn 13; unsure ham 12; unsure spam 44
-> fp rate 0.15%; fn rate 0.65%; unsure rate 1.4%
-> largest ham & spam cutoffs 0.035 & 0.985
-> fp 3; fn 13; unsure ham 12; unsure spam 44
-> fp rate 0.15%; fn rate 0.65%; unsure rate 1.4%
"""
Compare with the one from chi-square:
"""
-> best cost $48.00
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at 3 cutoff pairs
-> smallest ham & spam cutoffs 0.03 & 0.89
-> fp 3; fn 6; unsure ham 12; unsure spam 48
-> fp rate 0.15%; fn rate 0.3%; unsure rate 1.5%
-> largest ham & spam cutoffs 0.03 & 0.9
-> fp 3; fn 6; unsure ham 12; unsure spam 48
-> fp rate 0.15%; fn rate 0.3%; unsure rate 1.5%
"""
Looks like z-combining has real granularity problems near
the top end. Trash it.
- Alex