[spambayes-dev] Give up
onexperimental_ham_spam_imbalance_adjustment?
Meyer, Tony
T.A.Meyer at massey.ac.nz
Sat Sep 13 00:18:05 EDT 2003
> I'd like to ask everyone running the Outlook addin to change
> it to False in their default_bayes_customize.ini file, and just
> live with that for a week, noting any new peculiarities.
Since around April/May, I've had this option off, and I generally run
with an imbalance of roughly 1:10 ham to spam [1] - it's 418:4660 at the
moment. I've been happy with the results, both in terms of correct
classification and training speed.
Do you want people like me, who have it off, to turn it on for a week?
(If I do, I'll turn off the mixed uni/bigram scheme for the week, too).
I think the option tends to help with little imbalances (up to 1:5, for
example), and then starts to confuse people. Unfortunately, in real
life this teaches people the wrong thing - they train and things
improve, so they keep doing it, and then it starts to go wrong again.
If this is true (that it's good up to a certain imbalance) then the
plug-in could be smart enough to disable the option if the imbalance
reached a certain level - or it could warn the user that their training
method isn't that good (I know, I should test whether I'm right, but I
don't have the time at the moment).
Or if the plug-in was even smarter ;) then it could auto-manage the
corpora. If the user is training on too many spam, start automatically
training on all messages that are replied to. If the user is training
on too many ham, start subscribing her to junk lists <wink>.
In the meantime, I think the default could be changed to False. At
least the reason for things going 'wrong' is then more obvious to the
people that have no idea how it works.
=Tony Meyer
[1] Because I'm too lazy to create a sub-corpus of my spam collection,
or ham collection, so I use almost all my spam (to this address), plus
misclassified mail and whatever's in the inbox at training time.
More information about the spambayes-dev
mailing list