[Spambayes] Confusion about Unix or Linus documentation
skip at pobox.com
Thu Jul 22 01:09:33 CEST 2004
Aaron> If the messages that the user has identified as spam are placed
Aaron> in the same file as the messages that spambayes has already
Aaron> identified as spam, then we would be training on messages that
Aaron> have never been used in training but spambayes has already
Aaron> identified as spam.
Aaron> Is this a useful thing to so?
That depends on your desired training strategy. Using a train-on-everything
strategy, you'd certainly want that behavior, but I agree, in this day and
age with 80-something per cent of mail being sent purportedly spam, this can
grow your training database rapidly and perhaps unnecessarily. Still, to
catch false positives you have to save them somewhere. My procmail setup
saves all messages which score as spam and that aren't deleted outright into
a specific mailbox which I scan periodically. The only thing in that
mailbox I'd want to train on are false positives, which are rare beasties
You might check here:
for some of the (many) training strategy options. I use train-to-exhaustion
in such a way that throwing a message into the ham or spam pile that would
already score correctly gets it tossed out on its bum on the next pass, so
having a few extra of something is no big deal.
More information about the Spambayes