[spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment?

Tim Peters tim.one at comcast.net
Sun Sep 14 20:15:12 EDT 2003


[Tony Meyer]
>>> Since around April/May, I've had this option off,

[Tim]
>> Why did you turn it off?

[Tony]
> I was getting a lot of unsures and figured out (read a post?) that
> turning the option off might help.  It immediately helped matters,
> which meant that I didn't fall back to plan 2 (retrain with equal
> numbers).

That's highly relevant, then!  The option has no effect if there is in fact
an equal number of ham and spam training msgs, doesn't appear to make a
spit's worth of difference in my 2::1 spam::ham classifier, and made trouble
for you.

>> The option has no effect on training speed

> I meant 'training speed' as in the number of messages that have to be
> trained in order for a similar (to human eyes) message to be correctly
> classified.  A different phrase would have made that clearer, but I
> can't think of one to use :)

Let's call it training efficiency:  how much good you get out of a fixed
(but secret <wink>) number of training messages.  For example, I'm sure the
scheme mixing unigrams and bigrams has higher training efficiency than the
current unigram-only scheme, although given *enough* training data all
previous experiments didn't show higher accuracy either way.  Enabling
experimental_ham_spam_imbalance_adjustment has had a bad effect on training
efficiency for a number of people with unbalanced training data (including
you).  I think it would still be better to get the training data in balance,
but since it's not easy to force people to do that, it's tilting at
windmills.

>> We've been lax since then about getting loser code out of the
>> codebase.

> I've occasionally wondered if there's any point keeping the
> gary_combining_scheme stuff in there. No-one's using that any more,
> are they?

We agreed to get rid of that long ago, IIRC around the time of the first
alpha release.  I guess it's just that nobody has gotten around to it yet.
Feel encouraged.




More information about the spambayes-dev mailing list