[Spambayes] CL1 tests
Brad Clements
bkc@murkworks.com
Sun, 06 Oct 2002 22:23:02 -0400
Two tests, a and b using cl1 and rmspik.
Not formatted too well, doing this via vnc.
Set A
-> <stat> Ham scores for all in this training set: 6500 items; mean 1.99; sdev 10.31
-> <stat> min 0; median 0; max 100
* = 103 items
0 6254 *************************************************************
25 169 **
50 62 *
75 15 *
-> <stat> Spam scores for all in this training set: 6500 items; mean 99.08; sdev 6.76
-> <stat> min 0; median 100; max 100
* = 105 items
0 2 *
25 9 *
50 108 **
75 6381 *************************************************************
-> best cutoff for all in this training set: 0.5
-> with weighted total 1*77 fp + 11 fn = 88
-> fp rate 1.18% fn rate 0.169%
saving pickle to class1.pik
-> <stat> Ham scores for all runs: 6500 items; mean 1.99; sdev 10.31
-> <stat> min 0; median 0; max 100
* = 103 items
0 6254 *************************************************************
25 169 **
50 62 *
75 15 *
-> <stat> Spam scores for all runs: 6500 items; mean 99.08; sdev 6.76
-> <stat> min 0; median 100; max 100
* = 105 items
0 2 *
25 9 *
50 108 **
75 6381 *************************************************************
-> best cutoff for all runs: 0.5
-> with weighted total 1*77 fp + 11 fn = 88
-> fp rate 1.18% fn rate 0.169%
saving ham histogram pickle to class_hamhist.pik
saving spam histogram pickle to class_spamhist.pik
Saving all score data to pickle clim.pik
Reading results/cl1-a/clim.pik ...
Nham= 6500
RmsZham= 4.15398302786
Nspam= 6500
RmsZspam= 4.50455044819
======================================================================
HAM:
FALSE POSITIVE: zham=3.62 zspam=-1.05 Data/Ham/Set7/9964 SURE!
Sure/ok 6236
Unsure/ok 232
Unsure/not ok 31
Sure/not ok 1
Unsure rate = 4.05%
Sure fp rate = 0.02%; Unsure fp rate = 11.79%
======================================================================
SPAM:
FALSE NEGATIVE: zham=0.91 zspam=-3.42 Data/Spam/Set10/9656 SURE!
Sure/ok 6144
Unsure/ok 336
Unsure/not ok 19
Sure/not ok 1
Unsure rate = 5.46%
Sure fn rate = 0.02%; Unsure fn rate = 5.35%
Set B
-> <stat> Ham scores for all in this training set: 6500 items; mean 1.72; sdev 9.38
-> <stat> min 0; median 0; max 100
* = 103 items
0 6282 *************************************************************
25 173 **
50 37 *
75 8 *
-> <stat> Spam scores for all in this training set: 6500 items; mean 99.16; sdev 6.43
-> <stat> min 0; median 100; max 100
* = 105 items
0 1 *
25 10 *
50 99 *
75 6390 *************************************************************
-> best cutoff for all in this training set: 0.5
-> with weighted total 1*45 fp + 11 fn = 56
-> fp rate 0.692% fn rate 0.169%
saving pickle to class1.pik
-> <stat> Ham scores for all runs: 6500 items; mean 1.72; sdev 9.38
-> <stat> min 0; median 0; max 100
* = 103 items
0 6282 *************************************************************
25 173 **
50 37 *
75 8 *
-> <stat> Spam scores for all runs: 6500 items; mean 99.16; sdev 6.43
-> <stat> min 0; median 100; max 100
* = 105 items
0 1 *
25 10 *
50 99 *
75 6390 *************************************************************
-> best cutoff for all runs: 0.5
-> with weighted total 1*45 fp + 11 fn = 56
-> fp rate 0.692% fn rate 0.169%
saving ham histogram pickle to class_hamhist.pik
saving spam histogram pickle to class_spamhist.pik
Saving all score data to pickle clim.pik
Reading results/cl1-b/clim.pik ...
Nham= 6500
RmsZham= 4.43688346925
Nspam= 6500
RmsZspam= 4.49901192821
======================================================================
HAM:
FALSE POSITIVE: zham=8.00 zspam=-1.23 Data/Ham/Set1/10180 SURE!
FALSE POSITIVE: zham=3.55 zspam=-1.56 Data/Ham/Set4/69 SURE!
FALSE POSITIVE: zham=8.59 zspam=-0.61 Data/Ham/Set4/10008 SURE!
FALSE POSITIVE: zham=8.86 zspam=-0.32 Data/Ham/Set5/5105 SURE!
Sure/ok 6251
Unsure/ok 193
Unsure/not ok 52
Sure/not ok 4
Unsure rate = 3.77%
Sure fp rate = 0.06%; Unsure fp rate = 21.22%
======================================================================
SPAM:
FALSE NEGATIVE: zham=0.60 zspam=-3.25 Data/Spam/Set2/5185 SURE!
FALSE NEGATIVE: zham=0.19 zspam=-9.53 Data/Spam/Set3/3010 SURE!
Sure/ok 6131
Unsure/ok 337
Unsure/not ok 30
Sure/not ok 2
Unsure rate = 5.65%
Sure fn rate = 0.03%; Unsure fn rate = 8.17%
[Tokenizer]
mine_received_headers: True
[Classifier]
use_central_limit = True
use_central_limit2 = False
use_central_limit3 = False
zscore_ratio_cutoff: 1.9
[TestDriver]
spam_cutoff: 0.50
show_false_negatives: True
nbuckets: 4
show_spam_lo: 0.0
show_spam_hi: 0.45
save_trained_pickles: True
save_histogram_pickles: True
Brad Clements, bkc@murkworks.com (315)268-1000
http://www.murkworks.com (315)268-9812 Fax
AOL-IM: BKClements