[Spambayes] Total cost analysis
Tim Peters
tim.one@comcast.net
Mon, 14 Oct 2002 13:57:03 -0400
This is a multi-part message in MIME format.
---------------------- multipart/mixed attachment
CAUTION: For the attached histogram pair, cvcost sez:
tcap.txt: Optimal cost is $10.0 with grey zone between 89.0 and 97.0
but the new histogram analysis says:
-> best cost $0.80
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at 24 cutoff pairs
-> smallest ham & spam cutoffs 0.855 & 0.995
-> fp 0; fn 0; unsure ham 1; unsure spam 3
-> fp rate 0%; fn rate 0%; unsure rate 2%
-> largest ham & spam cutoffs 0.97 & 0.995
-> fp 0; fn 0; unsure ham 1; unsure spam 3
-> fp rate 0%; fn rate 0%; unsure rate 2%
and eyeballing the histograms shows that the latter is correct. I don't
know why cvcost.py thinks $10.00 is the best that can be done; I suspect
it's because it's skipping some cutoff pairs in order to save time.
---------------------- multipart/mixed attachment
-> <stat> Ham scores for all runs: 100 items; mean 7.21; sdev 18.87
-> <stat> min 3.34881e-009; median 0.18187; max 99.2347
* = 2 items
0.0 63 ********************************
0.5 6 ***
1.0 4 **
1.5 5 ***
2.0 1 *
2.5 0
3.0 3 **
3.5 0
4.0 0
4.5 0
5.0 0
5.5 0
6.0 0
6.5 0
7.0 0
7.5 0
8.0 0
8.5 0
9.0 1 *
9.5 1 *
10.0 0
10.5 0
11.0 0
11.5 0
12.0 1 *
12.5 0
13.0 0
13.5 0
14.0 0
14.5 1 *
15.0 0
15.5 1 *
16.0 1 *
16.5 0
17.0 0
17.5 0
18.0 0
18.5 1 *
19.0 1 *
19.5 0
20.0 0
20.5 0
21.0 0
21.5 0
22.0 0
22.5 0
23.0 0
23.5 1 *
24.0 0
24.5 0
25.0 0
25.5 0
26.0 0
26.5 0
27.0 0
27.5 1 *
28.0 0
28.5 0
29.0 0
29.5 0
30.0 0
30.5 1 *
31.0 0
31.5 0
32.0 0
32.5 0
33.0 0
33.5 0
34.0 0
34.5 0
35.0 0
35.5 0
36.0 0
36.5 0
37.0 0
37.5 0
38.0 0
38.5 0
39.0 0
39.5 0
40.0 0
40.5 0
41.0 0
41.5 0
42.0 0
42.5 0
43.0 0
43.5 0
44.0 1 *
44.5 1 *
45.0 0
45.5 0
46.0 0
46.5 0
47.0 0
47.5 0
48.0 0
48.5 0
49.0 0
49.5 0
50.0 0
50.5 0
51.0 0
51.5 0
52.0 0
52.5 0
53.0 0
53.5 0
54.0 0
54.5 0
55.0 0
55.5 0
56.0 0
56.5 0
57.0 0
57.5 0
58.0 0
58.5 0
59.0 0
59.5 0
60.0 0
60.5 0
61.0 0
61.5 0
62.0 0
62.5 0
63.0 0
63.5 1 *
64.0 0
64.5 0
65.0 0
65.5 0
66.0 0
66.5 0
67.0 0
67.5 0
68.0 0
68.5 0
69.0 0
69.5 0
70.0 0
70.5 0
71.0 0
71.5 0
72.0 0
72.5 0
73.0 0
73.5 0
74.0 0
74.5 1 *
75.0 0
75.5 0
76.0 0
76.5 0
77.0 1 *
77.5 0
78.0 0
78.5 0
79.0 0
79.5 0
80.0 0
80.5 0
81.0 0
81.5 0
82.0 0
82.5 0
83.0 0
83.5 0
84.0 0
84.5 0
85.0 1 *
85.5 0
86.0 0
86.5 0
87.0 0
87.5 0
88.0 0
88.5 0
89.0 0
89.5 0
90.0 0
90.5 0
91.0 0
91.5 0
92.0 0
92.5 0
93.0 0
93.5 0
94.0 0
94.5 0
95.0 0
95.5 0
96.0 0
96.5 0
97.0 0
97.5 0
98.0 0
98.5 0
99.0 1 *
99.5 0
-> <stat> Spam scores for all runs: 100 items; mean 99.94; sdev 0.34
-> <stat> min 97.0896; median 100; max 100
* = 2 items
0.0 0
0.5 0
1.0 0
1.5 0
2.0 0
2.5 0
3.0 0
3.5 0
4.0 0
4.5 0
5.0 0
5.5 0
6.0 0
6.5 0
7.0 0
7.5 0
8.0 0
8.5 0
9.0 0
9.5 0
10.0 0
10.5 0
11.0 0
11.5 0
12.0 0
12.5 0
13.0 0
13.5 0
14.0 0
14.5 0
15.0 0
15.5 0
16.0 0
16.5 0
17.0 0
17.5 0
18.0 0
18.5 0
19.0 0
19.5 0
20.0 0
20.5 0
21.0 0
21.5 0
22.0 0
22.5 0
23.0 0
23.5 0
24.0 0
24.5 0
25.0 0
25.5 0
26.0 0
26.5 0
27.0 0
27.5 0
28.0 0
28.5 0
29.0 0
29.5 0
30.0 0
30.5 0
31.0 0
31.5 0
32.0 0
32.5 0
33.0 0
33.5 0
34.0 0
34.5 0
35.0 0
35.5 0
36.0 0
36.5 0
37.0 0
37.5 0
38.0 0
38.5 0
39.0 0
39.5 0
40.0 0
40.5 0
41.0 0
41.5 0
42.0 0
42.5 0
43.0 0
43.5 0
44.0 0
44.5 0
45.0 0
45.5 0
46.0 0
46.5 0
47.0 0
47.5 0
48.0 0
48.5 0
49.0 0
49.5 0
50.0 0
50.5 0
51.0 0
51.5 0
52.0 0
52.5 0
53.0 0
53.5 0
54.0 0
54.5 0
55.0 0
55.5 0
56.0 0
56.5 0
57.0 0
57.5 0
58.0 0
58.5 0
59.0 0
59.5 0
60.0 0
60.5 0
61.0 0
61.5 0
62.0 0
62.5 0
63.0 0
63.5 0
64.0 0
64.5 0
65.0 0
65.5 0
66.0 0
66.5 0
67.0 0
67.5 0
68.0 0
68.5 0
69.0 0
69.5 0
70.0 0
70.5 0
71.0 0
71.5 0
72.0 0
72.5 0
73.0 0
73.5 0
74.0 0
74.5 0
75.0 0
75.5 0
76.0 0
76.5 0
77.0 0
77.5 0
78.0 0
78.5 0
79.0 0
79.5 0
80.0 0
80.5 0
81.0 0
81.5 0
82.0 0
82.5 0
83.0 0
83.5 0
84.0 0
84.5 0
85.0 0
85.5 0
86.0 0
86.5 0
87.0 0
87.5 0
88.0 0
88.5 0
89.0 0
89.5 0
90.0 0
90.5 0
91.0 0
91.5 0
92.0 0
92.5 0
93.0 0
93.5 0
94.0 0
94.5 0
95.0 0
95.5 0
96.0 0
96.5 0
97.0 1 *
97.5 0
98.0 0
98.5 2 *
99.0 0
99.5 97 *************************************************
-> best cost $0.80
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at 24 cutoff pairs
-> smallest ham & spam cutoffs 0.855 & 0.995
-> fp 0; fn 0; unsure ham 1; unsure spam 3
-> fp rate 0%; fn rate 0%; unsure rate 2%
-> largest ham & spam cutoffs 0.97 & 0.995
-> fp 0; fn 0; unsure ham 1; unsure spam 3
-> fp rate 0%; fn rate 0%; unsure rate 2%
C:\Code\spambayes>tcap/u
---------------------- multipart/mixed attachment--