[Spambayes] Training without ham
T. Alexander Popiel
popiel@wolfskeep.com
Mon Oct 28 23:31:34 2002
In message: <LNBBLJKPBEHFEDALKOLCAECDCCAB.tim.one@comcast.net>
Tim Peters <tim.one@comcast.net> writes:
>[T. Alexander Popiel]
>> Summary: Ham is required in the training set, as expected.
>> ...
>> So yes, spambayes is worthless without ham in the training corpus.
>
>Ya, but that doesn't prove we need to train on spam <wink>.
You are an evil man, Tim. Just for that, I present the following:
Summary: We need to train on spam, too.
Methodology is identical to my no-ham test, except that I'm using
very little spam instead of very little ham.
-> <stat> tested 200 hams & 200 spams against 1620 hams & 180 spams
[...]
-> <stat> tested 200 hams & 200 spams against 1800 hams & 0 spams
filename: 180-20 185-15 190-10 195-5 198-2 199-1 200-0
ham:spam: 2000:2000 2000:2000 2000:2000 2000:2000
2000:2000 2000:2000 2000:2000
fp total: 1 2 0 0 0 0 0
fp %: 0.05 0.10 0.00 0.00 0.00 0.00 0.00
fn total: 68 77 118 291 672 1223 2000
fn %: 3.40 3.85 5.90 14.55 33.60 61.15 100.00
unsure t: 318 378 554 1011 1160 707 0
unsure %: 7.95 9.45 13.85 25.27 29.00 17.68 0.00
real cost: $141.60 $172.60 $228.80 $493.20 $904.00$1364.40$2000.00
best cost: $92.60 $98.40 $127.40 $209.40 $371.00 $607.80 $800.00
h mean: 0.29 0.28 0.21 0.11 0.06 0.04 0.00
h sdev: 4.29 4.20 3.04 1.84 1.30 1.19 0.00
s mean: 90.53 88.71 81.88 62.52 37.17 20.21 0.00
s sdev: 22.22 23.77 28.84 33.27 29.01 25.77 0.00
mean diff: 90.24 88.43 81.67 62.41 37.11 20.17 0.00
k: 3.40 3.16 2.56 1.78 1.22 0.75 --NaN--
This is almost a perfect mirror image of the problem on the other
end, including the cutoffs approaching 0.0 and 0.005.
I won't bother with more detail on this one.
Tim, you're evil.
- Alex