[Spambayes] Total cost analysis
Rob Hooft
rob@hooft.net
Mon, 14 Oct 2002 22:12:38 +0200
Tim Peters wrote:
> CAUTION: For the attached histogram pair, cvcost sez:
>
> tcap.txt: Optimal cost is $10.0 with grey zone between 89.0 and 97.0
>
> but the new histogram analysis says:
>
> -> best cost $0.80
> -> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20
> -> achieved at 24 cutoff pairs
> -> smallest ham & spam cutoffs 0.855 & 0.995
> -> fp 0; fn 0; unsure ham 1; unsure spam 3
> -> fp rate 0%; fn rate 0%; unsure rate 2%
> -> largest ham & spam cutoffs 0.97 & 0.995
> -> fp 0; fn 0; unsure ham 1; unsure spam 3
> -> fp rate 0%; fn rate 0%; unsure rate 2%
>
> and eyeballing the histograms shows that the latter is correct. I don't
> know why cvcost.py thinks $10.00 is the best that can be done; I suspect
> it's because it's skipping some cutoff pairs in order to save time.
Yep, it only does full percentage points. It is a quick hack that should
be done away with now that it is implemented in the histogram analysis.
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/