[spambayes-dev] Give up
onexperimental_ham_spam_imbalance_adjustment?
Tim Peters
tim.one at comcast.net
Sun Sep 14 20:15:12 EDT 2003
[Tony Meyer]
>>> Since around April/May, I've had this option off,
[Tim]
>> Why did you turn it off?
[Tony]
> I was getting a lot of unsures and figured out (read a post?) that
> turning the option off might help. It immediately helped matters,
> which meant that I didn't fall back to plan 2 (retrain with equal
> numbers).
That's highly relevant, then! The option has no effect if there is in fact
an equal number of ham and spam training msgs, doesn't appear to make a
spit's worth of difference in my 2::1 spam::ham classifier, and made trouble
for you.
>> The option has no effect on training speed
> I meant 'training speed' as in the number of messages that have to be
> trained in order for a similar (to human eyes) message to be correctly
> classified. A different phrase would have made that clearer, but I
> can't think of one to use :)
Let's call it training efficiency: how much good you get out of a fixed
(but secret <wink>) number of training messages. For example, I'm sure the
scheme mixing unigrams and bigrams has higher training efficiency than the
current unigram-only scheme, although given *enough* training data all
previous experiments didn't show higher accuracy either way. Enabling
experimental_ham_spam_imbalance_adjustment has had a bad effect on training
efficiency for a number of people with unbalanced training data (including
you). I think it would still be better to get the training data in balance,
but since it's not easy to force people to do that, it's tilting at
windmills.
>> We've been lax since then about getting loser code out of the
>> codebase.
> I've occasionally wondered if there's any point keeping the
> gary_combining_scheme stuff in there. No-one's using that any more,
> are they?
We agreed to get rid of that long ago, IIRC around the time of the first
alpha release. I guess it's just that nobody has gotten around to it yet.
Feel encouraged.
More information about the spambayes-dev
mailing list