[spambayes-dev] RE: [Spambayes] How low can you go?
T. Alexander Popiel
popiel at wolfskeep.com
Mon Dec 29 17:00:14 EST 2003
In message: <LNBBLJKPBEHFEDALKOLCAEBHIBAB.tim.one at comcast.net>
"Tim Peters" <tim.one at comcast.net> writes:
>[T. Alexander Popiel]
>> ...
>> Yup. I have a nice picture now of the ratio over time at the bottom
>> of the report at:
>> http://www.wolfskeep.com/~popiel/spambayes/nonedge
>
>Hmm. That appears to be using a log scale for the Y (ratio) axis, so what
>*appears* to be straight-line growth in the ratio after about day 150 is
>really exponential growth. That could get bad over time <wink>.
Yeah, I used log scale for the ratio... log makes more sense to me for
ratios. I can trivially replot on linear scale if you want. ;-)
>Oh, there are billions of things that could be tried. Who knows what might
>pay?
Aye, there are. I don't have billions of CPU-days to burn, though,
so I'm trying to winnow down to stuff that's likely to pay off.
Theoretical beauty is one measure that sort of appeals.
><wink -- but "two decimal digits" is just an artifact of how scores get
>displayed>.
No argument there. I have no particular love for that rule, either.
>Asymmetric bounds also have some attraction, since, e.g., in mistake-based
>training "by hand" I always end up moving the ham cutoff closer to 0 than
>the spam cutoff is to 1.
One thing that's occurred to me is to have the training cutoffs at
N sigma from mean (where N == .5?) for the two populations; how you'd
bootstrap that is an open question, of course.
- Alex
More information about the spambayes-dev
mailing list