[Spambayes] chi-squared versus "prob strength"

Tim Peters tim.one@comcast.net
Mon, 14 Oct 2002 23:32:57 -0400


[Tim, to Rob, on switching from S/(S+H) to (S-H+1)/2]
> I was, but more importantly my test data agreed, so I'm going
> to switch to this (the evidence is so consistent and solid on both
> our datasets that making it an option would supply a pointless
> choice -- losers are killed).  Good show!

[Anthony Baxter]
> Here's what my mungo-test set shows for this (before is pre-Rob Hooft's
> change, after is current CVS)

This would have been a useful result, but, unfortunately, you ran it before
the histogram analysis was beefed up to tell us the useful bits.  If you
still have the final ("all runs") ham and spam histograms from *both* runs
in output files, you could post much more useful info by running them thru
cvcost.py.  With some pain I can run the new histo analysis for you on your
"after" run, because you included the full final histograms for that:

-> best cost for Anthoy's CVS run: $626.00
-> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
-> achieved at ham & spam cutoffs 0.85 & 0.995
->     fp 50; fn 75; unsure ham 110; unsure spam 145
->     fp rate 0.143%; fn rate 0.445%; unsure rate 0.493%

It's a peculiar pair of cutoffs, reflecting that you have few low-scoring
spam but (relative to others) many high-scoring ham.

The analysis is limited by nbuckets=200, as 50 of your ham scored in the
highest bucket:

99.5    50 *

and so there's no way to get rid of more FP at this granularity short of
calling everything ham.  However, your *median* spam score was 100:

-> <stat> Spam scores for all runs: 16848 items; mean 99.75; sdev 3.75
-> <stat> min 0.00333927; median 100; max 100

meaning that at least half your spam scored 100, so there may well be useful
distinctions still to be drawn if only we could peer inside the 0.995
bucket.  Your data is so nasty I think 200 buckets is too small for you; try
1000 next time?

In any case, the idea that these lines were telling useful truths:

> total unique fp went from 261 to 281 lost    +7.66%
> total unique fn went from 60 to 53 won    -11.67%

is right out.  In part, those say two things:

1. spam_cutoff was too low for the "after" run.

2. A single spam_cutoff doesn't make sense for the middle-ground methods:
   we're trying to *get* you a useful middle ground here, a small
   number of nasty msgs where we have strong reason to believe many
   mistakes will live.