[Spambayes] Love Spambayes, but I wonder if

Russ Foster rjf at russfoster.com
Tue Nov 8 22:43:45 CET 2005


I've start experimenting with Spambayes a bit more on my home Linux 
machine.

I have two directories: TrainSpam and TrainHam

I put false positives/negatives and unsures in the appropriate directory.
Every 5 minutes a cron job trains on those directories.

Once a day, another cron job:

- purges anything in these two directories that is older than 7 days.

- moves my existing 'hammiedb' file

- creates a new 'hammiedb' file

- forces a re-training on the TrainSpam and TrainHam directories

While I don't have anything quantitative, my amount of false negatives and 
false postives seems to be drastically reduced.

This script has the effect of only keeping words in the database that have 
been seen in the past 7 days. Accounting, somewhat, for the change in the 
character of spam (and ham).

Maybe once a day is overkill...but right now my system has cycles to 
spare.

-r



On Tue, 8 Nov 2005, Jesse Pelton wrote:

> See
> http://spambayes.sourceforge.net/faq.html#can-i-share-move-my-training-d
> ata-from-one-computer-to-another for how to do this.
>  
> But there's a price for that answer: I'm going to give you my opinion as
> well. I wouldn't bother copying the training data. SpamBayes learns very
> quickly, and the character of the spam I receive changes over time, so I
> rather than hanging on to training data, I delete it and retrain from
> scratch periodically. Within a day or two I find I'm getting better
> results.
> 
> 



More information about the SpamBayes mailing list