[spambayes-dev] RE: [Spambayes] How low can you go?

T. Alexander Popiel popiel at wolfskeep.com
Mon Dec 29 17:00:14 EST 2003


In message:  <LNBBLJKPBEHFEDALKOLCAEBHIBAB.tim.one at comcast.net>
             "Tim Peters" <tim.one at comcast.net> writes:
>[T. Alexander Popiel]
>> ...
>> Yup.  I have a nice picture now of the ratio over time at the bottom
>> of the report at:
>> http://www.wolfskeep.com/~popiel/spambayes/nonedge
>
>Hmm.  That appears to be using a log scale for the Y (ratio) axis, so what
>*appears* to be straight-line growth in the ratio after about day 150 is
>really exponential growth.  That could get bad over time <wink>.

Yeah, I used log scale for the ratio... log makes more sense to me for
ratios.  I can trivially replot on linear scale if you want. ;-)

>Oh, there are billions of things that could be tried.  Who knows what might
>pay?

Aye, there are.  I don't have billions of CPU-days to burn, though,
so I'm trying to winnow down to stuff that's likely to pay off.
Theoretical beauty is one measure that sort of appeals.

><wink -- but "two decimal digits" is just an artifact of how scores get
>displayed>.

No argument there.  I have no particular love for that rule, either.

>Asymmetric bounds also have some attraction, since, e.g., in mistake-based
>training "by hand" I always end up moving the ham cutoff closer to 0 than
>the spam cutoff is to 1.

One thing that's occurred to me is to have the training cutoffs at
N sigma from mean (where N == .5?) for the two populations; how you'd
bootstrap that is an open question, of course.

- Alex



More information about the spambayes-dev mailing list