[Spambayes] General training questions

Tony Meyer tameyer at ihug.co.nz
Thu Oct 14 06:07:15 CEST 2004

> Well, in the five days it's been working I haven't got any 
> ham yet. I had more or less abandoned that address over the
> last year, and haven't sent COA notices yet, so the only
> people who will send to it are people I 
> haven't corresponded with for a long time.
> I was waiting for SB to reduce the spam load, and it's now 
> getting well over half.
> Any mail left in my inbox is falsely categorised as ham, 
> right? So I'll start adding that to the training folder as
> well as the unsure stuff.
> And once I get the ham flowing, I guess SB will get even better. . .

If you're using SpamBayes to try and spot a rare instance of ham amongst a
lot of spam, then the regular training patterns probably don't apply so much
(they've been tested on the more typical balances of ham & spam).

In this case, perhaps building up a collection of sample mail from the
people that are likely to still have that address (or just a representative
sample of all your ham) and using that as the ham training.  For spam, just
collect a sample of about the same size and use that.  If you don't get any
ham, then you can probably skip ongoing training for the most part (maybe
just train on false negatives, and ignore unsures).

Basically, this is an unusual use-case, so you'll probably have to try a few
things out and figure out what works best for you.

=Tony Meyer

Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.

More information about the Spambayes mailing list