Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Wed Feb 12 07:52:55 EST 2003
2/12/2003 7:41:15 AM, "Justin F. Knotzke" <jknotzke at shampoo.ca> wrote:
> I just installed spambayes. I was using spamassassin and it was
>working rather well but I figure I'd give spambayes a go.
> I have a few questions regarding training.
> 1) I only had about 10 spam messages to train on. Is there somewhere,
>where I can download a larger list of spam or is it better to simply
>train on spam as it arrives?
Just go ahead and train on spam as it arrives. Spambayes will become quite
accurate very quickly, and your definition of 'spam' is different than anyone
else's. We have considered creating some stock databases, such as porn spam,
but even then it's difficult to know where someone might draw the line (is
Victoria's Secret mail porn?). When you train on your own spam, it's you
that's doing the defining, and you'll be very pleased with the results.
> 2) Regarding the cron job that does the training, is it OK to train
>on mail more then once? I move mail from my inbox to a mbox (stored
>mail) that gets archived every month. If the cron job runs every morning
>it will train on a folder that it already has training on plus a few
It certainly doesn't *hurt* to train on the same mail more than once. It
simply increases the spam probability of the words in that mail. It'd be the
same as receiving two of the same mail, and training on both. Personally, I'd
try to avoid that, but that's just me. No harm, no foul...
>Justin F. Knotzke
>jknotzke at shampoo.ca
>Spambayes mailing list
>Spambayes at python.org
c'est moi - TimS
More information about the Spambayes