[Spambayes] aging information

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Mon Feb 17 10:58:10 EST 2003

2/17/2003 10:41:02 AM, "D. R. Evans" <N7DR at arrisi.com> wrote:

>On 17 Feb 2003 at 10:13, Tim Stone - Four Stones Expressions wrote:
>> Aging is a very difficult problem, because spambayes simply keeps track
>> of tokens and the number of times you've said that mail with each token
>> is spam and ham.  That's all the information we retain about tokens.  We
>You can still make it work. Every time you do a new train do something 
>like this:
>for each token in the databse
>{ number of times this token has been in ham *= 0.99;
>  number of times this token has been in spam *= 0.99;

It would be very simple for you to write a prog that iterates the database, 
performing the calculation you suggest.  You can use runExport/runImport 
functions in dbExpImp.py as a jumping off point.  In fact, you could simply 
implement another option on that module, -a option. I'm not sure what happens 
if spamcount and hamcount become floats...  I (for one) would be interested to 
see the analysis of your results in terms of false positives and false 
negatives over time.

You might think about implementing it as a 'half life' algorithm.  This would 
allow users to determine their aging period, the spam evolutionary timescale.  
Some users may evolve their definition of spam very quickly and wish their 
database halflife to be quite short.  Others might wish to use a very slow 

At any rate, we invariably measure the success of these kind of things in 
terms of the fp and fn rate.  - TimS

>train as is currently done;
>Something like that ought to do the job, shouldn't it? That's what I 
>had in mind, anyway.
>  Doc
>Phone:  +1 303 494 0394
>Mobile: +1 720 839 8462
>Fax:    +1 781 240 0527

c'est moi - TimS

More information about the Spambayes mailing list