[Spambayes] FYI: Java implementation
anthony at interlink.com.au
Tue Jan 21 16:35:59 EST 2003
>>> Tim Peters wrote
> It's the sharpness and spread of the separation in chi- that's attractive.
> Our experiments showed (most of mine were on a 34,000-msg database) that you
> could usually pick cutoffs equally good under Gary-combining, but that it
> took 3 decimal digits of precision to do so, best cutoffs kept shifting over
> time (== amount of training data) and across test sets, and that it wasn't
> possible to guess good values in advance.
It's also worth noting that the optimal cutoff values before chi-combining
varied between 0.5 something and 0.7 for some people. It was impossible to
pick a number that worked for everyone.
(yes, I do plan to re-do the plots off the same data set at some point,
and add some for the CLM combiners... - if someone wants to do it first
and save me the effort, it would be faaaabulous)
More information about the Spambayes