[Spambayes] Improved comparison of classifier changes?
T. Alexander Popiel
popiel at wolfskeep.com
Fri Mar 7 10:29:04 EST 2003
In message: <9891913C5BFE87429D71E37F08210CB9297597 at zeus.sfhq.friskit.com>
"Piers Haken" <piersh at friskit.com> writes:
>(This came to me in a dream. No, really...)
>
>When comparing two different classifier/tokenizer strategies, instead of
>just comparing the numbers of false negatives and positives, how about
>comparing some function (product, sum, average,
>some-more-appropriate-statistical-function?) of the spam probability of
>all messages in each classification (spam, ham, false-positive,
>false-negative)? This might give a slightly better indication of not
>just the numbers of messages that were classified correctly/incorrectly,
>but of how sure the classifier was when it made those decisions.
>
>.. or was I just dreaming...?
Here's sample output from table.py:
filename: rcb rcB rCb rCB Rcb RcB RCb RCB
ham:spam: 2000:2000 2000:2000 2000:2000 2000:2000
2000:2000 2000:2000 2000:2000 2000:2000
fp total: 3 3 3 3 3 3 3 3
fp %: 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
fn total: 12 14 16 14 12 12 12 12
fn %: 0.60 0.70 0.80 0.70 0.60 0.60 0.60 0.60
unsure t: 53 37 50 39 40 31 37 32
unsure %: 1.32 0.93 1.25 0.97 1.00 0.78 0.93 0.80
real cost: $52.60 $51.40 $56.00 $51.80 $50.00 $48.20 $49.40 $48.40
best cost: $48.20 $45.20 $49.20 $45.60 $37.20 $38.80 $40.60 $38.60
h mean: 0.40 0.32 0.35 0.32 0.31 0.30 0.29 0.29
h sdev: 5.39 4.71 5.12 4.68 4.55 4.47 4.47 4.43
s mean: 98.45 98.68 98.35 98.68 98.75 98.85 98.72 98.85
s sdev: 9.76 9.57 10.46 9.58 9.08 9.06 9.37 9.11
mean diff: 98.05 98.36 98.00 98.36 98.44 98.55 98.43 98.56
k: 6.47 6.89 6.29 6.90 7.22 7.28 7.11 7.28
So yes, when using the test harness and associated tools, we do
compare more than just the fp and fn counts. We also look at
percentages, a weighted cost function, the best possible cost
achievable just by moving the ham and spam cutoffs, and the
mean scores, their separation, and their standard deviations.
We just haven't done much tokenizer testing lately, so these
reports aren't obvious in the recent archives.
- Alex
More information about the Spambayes
mailing list