[Spambayes] Total cost analysis

Rob Hooft rob@hooft.net
Mon, 14 Oct 2002 22:12:38 +0200


Tim Peters wrote:
> CAUTION:  For the attached histogram pair, cvcost sez:
> 
>     tcap.txt: Optimal cost is $10.0 with grey zone between 89.0 and 97.0
> 
> but the new histogram analysis says:
> 
> -> best cost $0.80
> -> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
> -> achieved at 24 cutoff pairs
> -> smallest ham & spam cutoffs 0.855 & 0.995
> ->     fp 0; fn 0; unsure ham 1; unsure spam 3
> ->     fp rate 0%; fn rate 0%; unsure rate 2%
> -> largest ham & spam cutoffs 0.97 & 0.995
> ->     fp 0; fn 0; unsure ham 1; unsure spam 3
> ->     fp rate 0%; fn rate 0%; unsure rate 2%
> 
> and eyeballing the histograms shows that the latter is correct.  I don't
> know why cvcost.py thinks $10.00 is the best that can be done; I suspect
> it's because it's skipping some cutoff pairs in order to save time.

Yep, it only does full percentage points. It is a quick hack that should 
be done away with now that it is implemented in the histogram analysis.

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/