[spambayes-dev] RE: [Spambayes] How low can you go?

Brendon Whateley spambayes at whateley.com
Wed Dec 17 14:08:52 EST 2003

Hash: SHA1


Couldn't we maintain (and use) a synthetic #messages value that is generated 
using the average number of tokens/message.  This way, as tokens are removed 
from the database, the synthetic number could be adjusted.  It seems (and I 
don't have time to think about it now, have to go pay the dog license!) that 
such a scheme would work quite well along with the "remove old tokens" scheme 
that ages unused tokens?  It probably doesn't matter if the number is 
accurate, provided the DB doesn't contain far too few tokens.


On Wednesday 17 December 2003 11:00 am, Seth Goodman wrote:
> [Bill Yerazunis]
> > Here's a particularly cute solution I implemented in CRM114.
> ---------snip----------------
> > Choose the tokens to decrement randomly.  REALLY randomly.  Don't
> Does CRM114 use the number of trained ham and trained spam *messages* as
> variables in its probability calculation?  If not, then you wouldn't expect
> that deleting infrequently used tokens would do much damage.  AFAIK,
> SpamBayes uses the trained message counts in the probability calculation
> and those becomes inaccurate if you delete individual tokens.
> --
> Seth Goodman
>   Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com
>   Spambots: disregard the above
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev at python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev

Version: PGP 6.5.8


More information about the spambayes-dev mailing list