[Spambayes] aging information

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Mon Feb 17 09:38:34 EST 2003

2/17/2003 9:30:50 AM, "D. R. Evans" <N7DR at arrisi.com> wrote:

>Does spambayes have any concept that "the older information is, the 
>less value it has"?

There was a huge discussion about this topic toward the end of the research 
phase of the project, maybe about october last year... At that time we decided  
not to implement this functionality, based on a whole bunch of reasons that I 
can't remember now...  maybe some of the other guys have a better memory than 
me.  But I think that it revolved around the idea that while the overall 
content and organization of spam certainly will evolve, the tokens (e.g. 
words) that are used in spam come from basically a finite set, and don't 
evolve in the same way that combinations of tokens (spam) evolve.  Since 
spambayes is completely focused on tokens, aging was deemed to be unnecessary.  
This to the best of my recollection...

We are beginning to see spammer attempts at altering their tokens to fool 
bayesian filters (a technology that they have nothing but fear for).  This 
tells us that we're already having an effect.  We'll see what ideas they come 
up with, and adjust our tokenizer to meet those challenges.

- TimS

>I ask because one's notion of what constitutes spam could change over 
>time, so it would seem reasonable to gradually (over a period of 
>weeks/months) decrease the effect of old training. (Probably by some 
>sort of exponential decay, so that, for example, every day the value of 
>old material is shifted slightly toward 0.5 by multiplying the 
>difference from 0.5 by a factor of 0.99, or some such algorithm.)
>  Doc Evans
>PS I am a new user (following the "Linux Journal" article. So far, I am 
>impressed at how well spambayes works. So far (after a few days) it has 
>not classified any ham as spam, and it is now catching about three 
>quarters of the spam.
>Phone:  +1 303 494 0394
>Mobile: +1 720 839 8462
>Fax:    +1 781 240 0527
>Spambayes mailing list
>Spambayes at python.org

c'est moi - TimS

More information about the Spambayes mailing list