[Spambayes] CL1 tests

Brad Clements bkc@murkworks.com
Sun, 06 Oct 2002 22:23:02 -0400


Two tests, a and b using cl1 and rmspik.

Not formatted too well, doing this via vnc. 


Set A

-> <stat> Ham scores for all in this training set: 6500 items; mean 1.99; sdev 10.31
-> <stat> min 0; median 0; max 100
* = 103 items
  0 6254 *************************************************************
 25  169 **
 50   62 *
 75   15 *

-> <stat> Spam scores for all in this training set: 6500 items; mean 99.08; sdev 6.76
-> <stat> min 0; median 100; max 100
* = 105 items
  0    2 *
 25    9 *
 50  108 **
 75 6381 *************************************************************
-> best cutoff for all in this training set: 0.5
->     with weighted total 1*77 fp + 11 fn = 88
->     fp rate 1.18%  fn rate 0.169%
    saving pickle to class1.pik

-> <stat> Ham scores for all runs: 6500 items; mean 1.99; sdev 10.31
-> <stat> min 0; median 0; max 100
* = 103 items
  0 6254 *************************************************************
 25  169 **
 50   62 *
 75   15 *

-> <stat> Spam scores for all runs: 6500 items; mean 99.08; sdev 6.76
-> <stat> min 0; median 100; max 100
* = 105 items
  0    2 *
 25    9 *
 50  108 **
 75 6381 *************************************************************
-> best cutoff for all runs: 0.5
->     with weighted total 1*77 fp + 11 fn = 88
->     fp rate 1.18%  fn rate 0.169%
    saving ham histogram pickle to class_hamhist.pik
    saving spam histogram pickle to class_spamhist.pik
Saving all score data to pickle clim.pik
Reading results/cl1-a/clim.pik ...
Nham= 6500
RmsZham= 4.15398302786
Nspam= 6500
RmsZspam= 4.50455044819
======================================================================
HAM:
FALSE POSITIVE: zham=3.62 zspam=-1.05 Data/Ham/Set7/9964 SURE!
Sure/ok       6236
Unsure/ok     232
Unsure/not ok 31
Sure/not ok   1
Unsure rate = 4.05%
Sure fp rate = 0.02%; Unsure fp rate = 11.79%
======================================================================
SPAM:
FALSE NEGATIVE: zham=0.91 zspam=-3.42 Data/Spam/Set10/9656 SURE!
Sure/ok       6144
Unsure/ok     336
Unsure/not ok 19
Sure/not ok   1
Unsure rate = 5.46%
Sure fn rate = 0.02%; Unsure fn rate = 5.35%




Set B


-> <stat> Ham scores for all in this training set: 6500 items; mean 1.72; sdev 9.38
-> <stat> min 0; median 0; max 100
* = 103 items
  0 6282 *************************************************************
 25  173 **
 50   37 *
 75    8 *

-> <stat> Spam scores for all in this training set: 6500 items; mean 99.16; sdev 6.43
-> <stat> min 0; median 100; max 100
* = 105 items
  0    1 *
 25   10 *
 50   99 *
 75 6390 *************************************************************
-> best cutoff for all in this training set: 0.5
->     with weighted total 1*45 fp + 11 fn = 56
->     fp rate 0.692%  fn rate 0.169%
    saving pickle to class1.pik

-> <stat> Ham scores for all runs: 6500 items; mean 1.72; sdev 9.38
-> <stat> min 0; median 0; max 100
* = 103 items
  0 6282 *************************************************************
 25  173 **
 50   37 *
 75    8 *

-> <stat> Spam scores for all runs: 6500 items; mean 99.16; sdev 6.43
-> <stat> min 0; median 100; max 100
* = 105 items
  0    1 *
 25   10 *
 50   99 *
 75 6390 *************************************************************
-> best cutoff for all runs: 0.5
->     with weighted total 1*45 fp + 11 fn = 56
->     fp rate 0.692%  fn rate 0.169%
    saving ham histogram pickle to class_hamhist.pik
    saving spam histogram pickle to class_spamhist.pik
Saving all score data to pickle clim.pik
Reading results/cl1-b/clim.pik ...
Nham= 6500
RmsZham= 4.43688346925
Nspam= 6500
RmsZspam= 4.49901192821
======================================================================
HAM:
FALSE POSITIVE: zham=8.00 zspam=-1.23 Data/Ham/Set1/10180 SURE!
FALSE POSITIVE: zham=3.55 zspam=-1.56 Data/Ham/Set4/69 SURE!
FALSE POSITIVE: zham=8.59 zspam=-0.61 Data/Ham/Set4/10008 SURE!
FALSE POSITIVE: zham=8.86 zspam=-0.32 Data/Ham/Set5/5105 SURE!
Sure/ok       6251
Unsure/ok     193
Unsure/not ok 52
Sure/not ok   4
Unsure rate = 3.77%
Sure fp rate = 0.06%; Unsure fp rate = 21.22%
======================================================================
SPAM:
FALSE NEGATIVE: zham=0.60 zspam=-3.25 Data/Spam/Set2/5185 SURE!
FALSE NEGATIVE: zham=0.19 zspam=-9.53 Data/Spam/Set3/3010 SURE!
Sure/ok       6131
Unsure/ok     337
Unsure/not ok 30
Sure/not ok   2
Unsure rate = 5.65%
Sure fn rate = 0.03%; Unsure fn rate = 8.17%



[Tokenizer]
mine_received_headers: True

[Classifier]
use_central_limit = True
use_central_limit2 = False
use_central_limit3 = False
zscore_ratio_cutoff: 1.9

[TestDriver]
spam_cutoff: 0.50
show_false_negatives: True
nbuckets: 4

show_spam_lo: 0.0
show_spam_hi: 0.45

save_trained_pickles: True
save_histogram_pickles: True


Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
AOL-IM: BKClements