[Spambayes] Training without ham

T. Alexander Popiel popiel@wolfskeep.com
Mon Oct 28 23:31:34 2002


In message:  <LNBBLJKPBEHFEDALKOLCAECDCCAB.tim.one@comcast.net>
             Tim Peters <tim.one@comcast.net> writes:
>[T. Alexander Popiel]
>> Summary: Ham is required in the training set, as expected.
>> ...
>> So yes, spambayes is worthless without ham in the training corpus.
>
>Ya, but that doesn't prove we need to train on spam <wink>.

You are an evil man, Tim.  Just for that, I present the following:

Summary: We need to train on spam, too.

Methodology is identical to my no-ham test, except that I'm using
very little spam instead of very little ham.

-> <stat> tested 200 hams & 200 spams against 1620 hams & 180 spams
[...]
-> <stat> tested 200 hams & 200 spams against 1800 hams & 0 spams
filename:   180-20  185-15  190-10   195-5   198-2   199-1   200-0
ham:spam:  2000:2000       2000:2000       2000:2000       2000:2000
                   2000:2000       2000:2000       2000:2000      
fp total:        1       2       0       0       0       0       0
fp %:         0.05    0.10    0.00    0.00    0.00    0.00    0.00
fn total:       68      77     118     291     672    1223    2000
fn %:         3.40    3.85    5.90   14.55   33.60   61.15  100.00
unsure t:      318     378     554    1011    1160     707       0
unsure %:     7.95    9.45   13.85   25.27   29.00   17.68    0.00
real cost: $141.60 $172.60 $228.80 $493.20 $904.00$1364.40$2000.00
best cost:  $92.60  $98.40 $127.40 $209.40 $371.00 $607.80 $800.00
h mean:       0.29    0.28    0.21    0.11    0.06    0.04    0.00
h sdev:       4.29    4.20    3.04    1.84    1.30    1.19    0.00
s mean:      90.53   88.71   81.88   62.52   37.17   20.21    0.00
s sdev:      22.22   23.77   28.84   33.27   29.01   25.77    0.00
mean diff:   90.24   88.43   81.67   62.41   37.11   20.17    0.00
k:            3.40    3.16    2.56    1.78    1.22    0.75 --NaN--

This is almost a perfect mirror image of the problem on the other
end, including the cutoffs approaching 0.0 and 0.005.

I won't bother with more detail on this one.

Tim, you're evil.

- Alex