[Spambayes] CL2 test part II

Sun, 06 Oct 2002 18:53:13 -0400

[Brad Clements]
> In my earlier CL2 and CL3 tests, I trained on the 2nd half of my
> corpus, and tested the first half.
>
> Now, I'm training on the first half and testing the 2nd half.
>
> First run of CL2 uncovered more misclassifications (which
> probably affected the training of my first test).
>
> ...
>
> In any case, here's CL2 results training first, testing second half.
>
> > <stat> Ham scores for all runs: 6500 items; mean 0.94; sdev 7.21
> -> <stat> min 0; median 0; max 100
> * = 105 items
>   0 6384 *************************************************************
>  25   87 *
>  50   21 *
>  75    8 *

Let me interleave these from various followup msgs.

Ham for clt2 rmspick:

Reading results/cl2-b/clim.pik ...
Nham= 6500
RmsZham= 4.48175325539
Nspam= 6500
RmsZspam= 3.7809204202
======================================================================
HAM:
FALSE POSITIVE: zham=-5.79 zspam=-2.33 Data/Ham/Set6/6438 SURE!
FALSE POSITIVE: zham=-5.43 zspam=-2.27 Data/Ham/Set6/10068 SURE!
FALSE POSITIVE: zham=-3.97 zspam=-1.72 Data/Ham/Set7/9964 SURE!
FALSE POSITIVE: zham=-6.17 zspam=-2.35 Data/Ham/Set9/6415 SURE!
Sure/ok       6297
Unsure/ok     181
Unsure/not ok 18
Sure/not ok   4
Unsure rate = 3.06%
Sure fp rate = 0.06%; Unsure fp rate = 9.05%

Ham for clt3:

-> <stat> Ham scores for all runs: 6500 items; mean 0.98; sdev 7.64
-> <stat> min 0; median 0; max 100
* = 105 items
  0 6386 *************************************************************
 25   74 *
 50   26 *
 75   14 *

Ham for clt3 rmspick:

Reading results/cl3-b/clim.pik ...
Nham= 6500
RmsZham= 12.6590343376
Nspam= 6500
RmsZspam= 14.8475623174
======================================================================
HAM:
FALSE POSITIVE: zham=-6.65 zspam=-2.35 Data/Ham/Set6/6438 SURE!
FALSE POSITIVE: zham=-6.19 zspam=-2.29 Data/Ham/Set6/10068 SURE!
FALSE POSITIVE: zham=-4.60 zspam=-1.73 Data/Ham/Set7/9964 SURE!
FALSE POSITIVE: zham=-7.11 zspam=-2.37 Data/Ham/Set9/6415 SURE!
Sure/ok       6294
Unsure/ok     182
Unsure/not ok 20
Sure/not ok   4
Unsure rate = 3.11%
Sure fp rate = 0.06%; Unsure fp rate = 9.90%

I don't see a significant difference between clt2 & clt3 here; clt2 may be
doing slightly better.  The differences after rmspick are clearly
insignificant.  rmspick is unsure twice as often, but is dead wrong half as
often as clt2, and even better wrt raw clt3.

On to the spam:

> -> <stat> Spam scores for all runs: 6500 items; mean 99.32; sdev 5.94
> -> <stat> min 0; median 100; max 100
> * = 106 items
>   0    3 *
>  25   15 *
>  50   68 *
>  75 6414 *************************************************************
> -> best cutoff for all runs: 0.5
> ->     with weighted total 1*29 fp + 18 fn = 47
> ->     fp rate 0.446%  fn rate 0.277%

Spam for clt2 rmspick:

SPAM:
FALSE NEGATIVE: zham=-1.86 zspam=-3.48 Data/Spam/Set7/6718 SURE!
FALSE NEGATIVE: zham=-1.48 zspam=-6.37 Data/Spam/Set10/10979 SURE!  BOGUS
Sure/ok       6240
Unsure/ok     232
Unsure/not ok 26
Sure/not ok   2
Unsure rate = 3.97%
Sure fn rate = 0.03%; Unsure fn rate = 10.08%

and the 2nd false negative was bogus (really a ham).  rmspick was uncertain
3x as often.

Spam for clt3:

-> <stat> Spam scores for all runs: 6500 items; mean 98.85; sdev 7.84
-> <stat> min 0; median 100; max 100
* = 105 items
  0    8 *
 25   22 *
 50  113 **
 75 6357 *************************************************************
-> best cutoff for all runs: 0.5
->     with weighted total 1*19 fp + 30 fn = 49
->     fp rate 0.292%  fn rate 0.462%

Less certain than clt2, *and* made more mistakes when certain (that's a bad
combination <wink>).

Spam for clt3 rmspick:

SPAM:
FALSE NEGATIVE: zham=-1.37 zspam=-6.36 Data/Spam/Set10/10979 SURE!
Sure/ok       6271
Unsure/ok     207
Unsure/not ok 21
Sure/not ok   1
Unsure rate = 3.51%
Sure fn rate = 0.02%; Unsure fn rate = 9.21%

Uncertain more often than raw clt3, but far fewer errors when certain.

Overall, I'd say that clt2 works better for you than clt3, and that rmspick
gives an improvement either way.  I bet you're just dying to try clt1
<wink>.

> [Tokenizer]
> mine_received_headers: True
>
> [Classifier]
> use_central_limit2 = True
> use_central_limit3 = False
> zscore_ratio_cutoff: 1.9
>
> [TestDriver]
> spam_cutoff: 0.50
> show_false_negatives: True
> nbuckets: 4
>
> show_spam_lo: 0.0
> show_spam_hi: 0.45
>
> save_trained_pickles: True
> save_histogram_pickles: True

Looks good!  Thank you, Brad.