[Spambayes] Experimental Ham/Spam imbalance setting

Moore, Paul Paul.Moore at atosorigin.com
Thu May 22 13:58:23 EDT 2003


I have a friend who is using the POP3 proxy for his mail. He has a
10:1 spam:ham imbalance, and he's found that he gets quite a high
proportion of unsures (from 200 or so mails a day, over 75% of which
are spam). His DB contains about 1300 spam and 150 spam. In addition
to the unsure rate being high, he's finding that training on the
unsures isn't helping. I suspect that this is because the Ham/Spam
imbalance setting means that training one unsure as spam has little
effect (10% of the effect it'd have on a balanced DB?). Am I right in
thinking that pop3proxy has this parameter set to true? I know it is
for the Outlook plugin (which I use, but I have a fairly balanced DB
these days).

Is there any good view on whether the setting is a good thing yet?
My feeling is that the higher proportion of unsures, plus the
unresponsiveness to training, makes it an overall loss. I got the same
qualitative results myself when my DB was badly unbalanced - that's
why I made the effort to make and keep my DB balanced. But I have no
corresponding feel for the real-life results with the parameter *not*
set.

My friend has now purged his database and is starting from scratch,
to try to improve his results. I mentioned the setting, but as it's
a config file edit, rather than a button in the UI, he didn't feel
comfortable changing it (and AIUI, he'd need to retrain as well - is
that right?)

Maybe the option should be exposed in the UI (but that may not be
sensible if changing it *does* require a retrain). If it is, then the
help could explain that this option is only relevant if your database
has unequal numbers of ham and spam, and what the disadvantages of
each setting are (option set = more unsures, less responsive to
training; option unset = ???)

If the option isn't exposed, I'd vote for taking it out. We're not
getting any useful new feedback that I'm aware of.

Paul.



More information about the Spambayes mailing list