[Spambayes] How many is enough?

T. Alexander Popiel popiel at wolfskeep.com
Sun May 11 20:37:58 EDT 2003


In message:  <5.2.0.9.0.20030512015036.02010fe0 at localhost>
             Peter Bengtsson <mail at peterbe.com> writes:

>I've read the pages at http://spambayes.sourceforge.net/ now and concluded 
>that you should train your database, but not too much.
>What I fail to find is some numbers for this. Are we talking about hundreds 
>or thousands or millions?
>I've trained my database with 3000 ham and only 50 spam. That was basically 
>all I had available in my email client at the moment.
>
>So, how much should I train before I run the risk of overdoing it?

I've not had any problems with training with tens of thousands
of messages (6345 ham, 17063 spam for last night's retrain).
The only reson I don't train with my full 44219 message archive
is to control database size... and that really is a pretty minor
concern.  My incremental training tests showed that more was
better for accuracy...

- Alex



More information about the Spambayes mailing list