[Spambayes] Is Equal Ham & Spam really the best?

Amedee Van Gasse amedee at amedee.be
Mon Jul 30 09:56:00 CEST 2007


On Mon, July 30, 2007 00:57, skip at pobox.com wrote:
>
>     Amedee> I have the same experience:
>
>     Amedee> amedee at elbereth { ~ }$ ./spamstats
>     Amedee>  Spam: 2415 Ham: 651
>
>     Amedee> That's 3.7:1, and it's increasing.
>
> One of the reasons I can keep a nearly 1:1 ratio is that when it gets a
> bit
> out of whack I simply delete some old spam.  In my experience the nature
> of
> spam changes over time while the nature of ham rarely does.  I also use
> train-to-exhaustion which only trains in fixed ratios.

I don't keep any spam at all. Should I? I don't understand.

>From the beginning I trained SpamBayes only on mistakes and unsures.
After about a week, it performed so good that I was confident enough to
/dev/null anything with a spam score of 100%.
Anything with a score between 90% and 99% gets moved to my spam training
folder (after vgrepping it for hammy clues) and after my training script
is run by a daily cron job, the training folders are purged.

The only place where I have old spam, is inside the token database. Do you
mean you delete the old tokens? How do you do that? Or do you keep your
old spam and retrain from scratch every ƒ´t?

-- 
Amedee



More information about the SpamBayes mailing list