[Spambayes] Routine training on correctly classified email?

Fri Dec 5 15:55:23 EST 2003

Eamon Egan wrote:
> I doubt this hasn't been asked or discussed before, but since I'm new
> to the mailing list and couldn't find an easy way to search the
> archives (without downloading the lot of them), here goes.
> 
> I gather that spambayes does not train itself automatically on
> messages classified as spam (or not-spam). It only trains on mistakes.
> 
> I just saw a message referring to the practice of manually training on
> caught spam.
> 
> Can anyone give me a quick rundown on the merits of training (perhaps
> with a lower weight) on ham and spam that is correctly classified? It
> would seem that if spam and ham evolves, this would allow the
> classifier to track any trends. Is (optionally) doing this routinely
> considered a useful feature and is it being considered for inclusion?

The Unix SpamBayes filter has an all-or-nothing option to train on all
messages that are classified as certain ham or certain spam, but this is
not currently supported for Outlook.

I'm currently working on some *very* experimental options to add
configurable automatic training and automatic balancing to the Outlook
plugin.  However, I'm sure it will take a bit of tweaking before these
would even be worth considering for a public release, and I'm short on
time to put into it at the moment.

So far, we have no definitive proof that automatic training is any
better or worse than mistake-based training.  I'm sure it depends a lot
on your particular mix of ham and spam.  There is still a lot of work to
be done in determining if there is a "best" method of training.

-- 
Kenny Pitt