[Spambayes] Love Spambayes, but I wonder if
Russ Foster
rjf at russfoster.com
Tue Nov 8 22:43:45 CET 2005
I've start experimenting with Spambayes a bit more on my home Linux
machine.
I have two directories: TrainSpam and TrainHam
I put false positives/negatives and unsures in the appropriate directory.
Every 5 minutes a cron job trains on those directories.
Once a day, another cron job:
- purges anything in these two directories that is older than 7 days.
- moves my existing 'hammiedb' file
- creates a new 'hammiedb' file
- forces a re-training on the TrainSpam and TrainHam directories
While I don't have anything quantitative, my amount of false negatives and
false postives seems to be drastically reduced.
This script has the effect of only keeping words in the database that have
been seen in the past 7 days. Accounting, somewhat, for the change in the
character of spam (and ham).
Maybe once a day is overkill...but right now my system has cycles to
spare.
-r
On Tue, 8 Nov 2005, Jesse Pelton wrote:
> See
> http://spambayes.sourceforge.net/faq.html#can-i-share-move-my-training-d
> ata-from-one-computer-to-another for how to do this.
>
> But there's a price for that answer: I'm going to give you my opinion as
> well. I wouldn't bother copying the training data. SpamBayes learns very
> quickly, and the character of the spam I receive changes over time, so I
> rather than hanging on to training data, I delete it and retrain from
> scratch periodically. Within a day or two I find I'm getting better
> results.
>
>
More information about the SpamBayes
mailing list