[Spambayes] Problem with "Delete As Spam" in Outlook plugin ..

Tim Peters tim.one at comcast.net
Fri Sep 12 13:19:21 EDT 2003


[Mark G. Spencer]
> I get a massive amount of spam as I have promiscuous receipt of email
> enabled on one of my domains.  I've noticed that after a week of
> training, the SpamBayes Outlook plugin rarely filters out spam, and
> very infrequently filters email as unsure.  I understand part of this
> may be due to having 120 emails trained as good, and 4,000 as bad?

Yes, imbalance is bad, and a factor of more than 30 difference is extreme.
The system wasn't designed, tested, or tuned with extremely unbalanced
amounts of training data, and we still don't have a good scheme for dealing
with it when it occurs.  If you find your default_bayes_customize.ini and
change the line

experimental_ham_spam_imbalance_adjustment: True

to

experimental_ham_spam_imbalance_adjustment: False

your specific complaint today should go away, but then it's also likely that
your FP rate will increase (all messages will tend to get spammier scores
then).

> From some responses I've seen here it looks like I need to have a
> more even distribution of good and bad email trained?  This won't
> (and can't ever) work for me as I get buried by mountains of spam,
> but I'm way off my topic now .. ;)

No, how you train is entirely up to you.  If you get a great deal more spam
than ham but want to keep your training data balanced, it's really not hard
to do so:  throw out most of your spam, keeping only a fraction of it to
train on.  Or, if you want to keep all the spam (beats me ... up to you),
copy a small fraction of it to a new here's-the-spam-I-train-on folder.
This is a statistical system, and doesn't need exhaustive training.  Indeed,
if there were a feasible way to do it, it would be best if the system were
able to train on a relatively small *random* sample of all the email you
get.

> Anyway .. When I "Delete As Spam" a small number of emails, say fifty
> of them, I get the hourglass for a few seconds, my disk spins, and
> the emails are properly transferred.  When I "Delete as Spam" a large
> number of emails, say five hundred or even thousands, the "Delete As
> Spam" option stays grey for a few seconds, then goes back to its
> normal color like I hadn't pressed it!  The disk doesn't spin, no
> hourglass, nothing.  I've waited a couple minutes, and nothing
> happened.

Heh -- I didn't even know you *could* select multiple messages before
hitting "Delete as Spam".  So, sorry, don't know anything about this.

But change the option as described above, or balance your training data
(ditto), and you probably won't care about this other issue anymore.




More information about the Spambayes mailing list