[Spambayes] Total cost analysis

Tim Peters tim.one@comcast.net
Mon, 14 Oct 2002 13:57:03 -0400


This is a multi-part message in MIME format.

---------------------- multipart/mixed attachment
CAUTION:  For the attached histogram pair, cvcost sez:

    tcap.txt: Optimal cost is $10.0 with grey zone between 89.0 and 97.0

but the new histogram analysis says:

-> best cost $0.80
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at 24 cutoff pairs
-> smallest ham & spam cutoffs 0.855 & 0.995
->     fp 0; fn 0; unsure ham 1; unsure spam 3
->     fp rate 0%; fn rate 0%; unsure rate 2%
-> largest ham & spam cutoffs 0.97 & 0.995
->     fp 0; fn 0; unsure ham 1; unsure spam 3
->     fp rate 0%; fn rate 0%; unsure rate 2%

and eyeballing the histograms shows that the latter is correct.  I don't
know why cvcost.py thinks $10.00 is the best that can be done; I suspect
it's because it's skipping some cutoff pairs in order to save time.

---------------------- multipart/mixed attachment

-> <stat> Ham scores for all runs: 100 items; mean 7.21; sdev 18.87
-> <stat> min 3.34881e-009; median 0.18187; max 99.2347
* = 2 items
 0.0 63 ********************************
 0.5  6 ***
 1.0  4 **
 1.5  5 ***
 2.0  1 *
 2.5  0 
 3.0  3 **
 3.5  0 
 4.0  0 
 4.5  0 
 5.0  0 
 5.5  0 
 6.0  0 
 6.5  0 
 7.0  0 
 7.5  0 
 8.0  0 
 8.5  0 
 9.0  1 *
 9.5  1 *
10.0  0 
10.5  0 
11.0  0 
11.5  0 
12.0  1 *
12.5  0 
13.0  0 
13.5  0 
14.0  0 
14.5  1 *
15.0  0 
15.5  1 *
16.0  1 *
16.5  0 
17.0  0 
17.5  0 
18.0  0 
18.5  1 *
19.0  1 *
19.5  0 
20.0  0 
20.5  0 
21.0  0 
21.5  0 
22.0  0 
22.5  0 
23.0  0 
23.5  1 *
24.0  0 
24.5  0 
25.0  0 
25.5  0 
26.0  0 
26.5  0 
27.0  0 
27.5  1 *
28.0  0 
28.5  0 
29.0  0 
29.5  0 
30.0  0 
30.5  1 *
31.0  0 
31.5  0 
32.0  0 
32.5  0 
33.0  0 
33.5  0 
34.0  0 
34.5  0 
35.0  0 
35.5  0 
36.0  0 
36.5  0 
37.0  0 
37.5  0 
38.0  0 
38.5  0 
39.0  0 
39.5  0 
40.0  0 
40.5  0 
41.0  0 
41.5  0 
42.0  0 
42.5  0 
43.0  0 
43.5  0 
44.0  1 *
44.5  1 *
45.0  0 
45.5  0 
46.0  0 
46.5  0 
47.0  0 
47.5  0 
48.0  0 
48.5  0 
49.0  0 
49.5  0 
50.0  0 
50.5  0 
51.0  0 
51.5  0 
52.0  0 
52.5  0 
53.0  0 
53.5  0 
54.0  0 
54.5  0 
55.0  0 
55.5  0 
56.0  0 
56.5  0 
57.0  0 
57.5  0 
58.0  0 
58.5  0 
59.0  0 
59.5  0 
60.0  0 
60.5  0 
61.0  0 
61.5  0 
62.0  0 
62.5  0 
63.0  0 
63.5  1 *
64.0  0 
64.5  0 
65.0  0 
65.5  0 
66.0  0 
66.5  0 
67.0  0 
67.5  0 
68.0  0 
68.5  0 
69.0  0 
69.5  0 
70.0  0 
70.5  0 
71.0  0 
71.5  0 
72.0  0 
72.5  0 
73.0  0 
73.5  0 
74.0  0 
74.5  1 *
75.0  0 
75.5  0 
76.0  0 
76.5  0 
77.0  1 *
77.5  0 
78.0  0 
78.5  0 
79.0  0 
79.5  0 
80.0  0 
80.5  0 
81.0  0 
81.5  0 
82.0  0 
82.5  0 
83.0  0 
83.5  0 
84.0  0 
84.5  0 
85.0  1 *
85.5  0 
86.0  0 
86.5  0 
87.0  0 
87.5  0 
88.0  0 
88.5  0 
89.0  0 
89.5  0 
90.0  0 
90.5  0 
91.0  0 
91.5  0 
92.0  0 
92.5  0 
93.0  0 
93.5  0 
94.0  0 
94.5  0 
95.0  0 
95.5  0 
96.0  0 
96.5  0 
97.0  0 
97.5  0 
98.0  0 
98.5  0 
99.0  1 *
99.5  0 

-> <stat> Spam scores for all runs: 100 items; mean 99.94; sdev 0.34
-> <stat> min 97.0896; median 100; max 100
* = 2 items
 0.0  0 
 0.5  0 
 1.0  0 
 1.5  0 
 2.0  0 
 2.5  0 
 3.0  0 
 3.5  0 
 4.0  0 
 4.5  0 
 5.0  0 
 5.5  0 
 6.0  0 
 6.5  0 
 7.0  0 
 7.5  0 
 8.0  0 
 8.5  0 
 9.0  0 
 9.5  0 
10.0  0 
10.5  0 
11.0  0 
11.5  0 
12.0  0 
12.5  0 
13.0  0 
13.5  0 
14.0  0 
14.5  0 
15.0  0 
15.5  0 
16.0  0 
16.5  0 
17.0  0 
17.5  0 
18.0  0 
18.5  0 
19.0  0 
19.5  0 
20.0  0 
20.5  0 
21.0  0 
21.5  0 
22.0  0 
22.5  0 
23.0  0 
23.5  0 
24.0  0 
24.5  0 
25.0  0 
25.5  0 
26.0  0 
26.5  0 
27.0  0 
27.5  0 
28.0  0 
28.5  0 
29.0  0 
29.5  0 
30.0  0 
30.5  0 
31.0  0 
31.5  0 
32.0  0 
32.5  0 
33.0  0 
33.5  0 
34.0  0 
34.5  0 
35.0  0 
35.5  0 
36.0  0 
36.5  0 
37.0  0 
37.5  0 
38.0  0 
38.5  0 
39.0  0 
39.5  0 
40.0  0 
40.5  0 
41.0  0 
41.5  0 
42.0  0 
42.5  0 
43.0  0 
43.5  0 
44.0  0 
44.5  0 
45.0  0 
45.5  0 
46.0  0 
46.5  0 
47.0  0 
47.5  0 
48.0  0 
48.5  0 
49.0  0 
49.5  0 
50.0  0 
50.5  0 
51.0  0 
51.5  0 
52.0  0 
52.5  0 
53.0  0 
53.5  0 
54.0  0 
54.5  0 
55.0  0 
55.5  0 
56.0  0 
56.5  0 
57.0  0 
57.5  0 
58.0  0 
58.5  0 
59.0  0 
59.5  0 
60.0  0 
60.5  0 
61.0  0 
61.5  0 
62.0  0 
62.5  0 
63.0  0 
63.5  0 
64.0  0 
64.5  0 
65.0  0 
65.5  0 
66.0  0 
66.5  0 
67.0  0 
67.5  0 
68.0  0 
68.5  0 
69.0  0 
69.5  0 
70.0  0 
70.5  0 
71.0  0 
71.5  0 
72.0  0 
72.5  0 
73.0  0 
73.5  0 
74.0  0 
74.5  0 
75.0  0 
75.5  0 
76.0  0 
76.5  0 
77.0  0 
77.5  0 
78.0  0 
78.5  0 
79.0  0 
79.5  0 
80.0  0 
80.5  0 
81.0  0 
81.5  0 
82.0  0 
82.5  0 
83.0  0 
83.5  0 
84.0  0 
84.5  0 
85.0  0 
85.5  0 
86.0  0 
86.5  0 
87.0  0 
87.5  0 
88.0  0 
88.5  0 
89.0  0 
89.5  0 
90.0  0 
90.5  0 
91.0  0 
91.5  0 
92.0  0 
92.5  0 
93.0  0 
93.5  0 
94.0  0 
94.5  0 
95.0  0 
95.5  0 
96.0  0 
96.5  0 
97.0  1 *
97.5  0 
98.0  0 
98.5  2 *
99.0  0 
99.5 97 *************************************************
-> best cost $0.80
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at 24 cutoff pairs
-> smallest ham & spam cutoffs 0.855 & 0.995
->     fp 0; fn 0; unsure ham 1; unsure spam 3
->     fp rate 0%; fn rate 0%; unsure rate 2%
-> largest ham & spam cutoffs 0.97 & 0.995
->     fp 0; fn 0; unsure ham 1; unsure spam 3
->     fp rate 0%; fn rate 0%; unsure rate 2%

C:\Code\spambayes>tcap/u

---------------------- multipart/mixed attachment--