[spambayes-dev] RE: [Spambayes] How low can you go?
spambayes at whateley.com
Wed Dec 17 14:08:52 EST 2003
-----BEGIN PGP SIGNED MESSAGE-----
Couldn't we maintain (and use) a synthetic #messages value that is generated
using the average number of tokens/message. This way, as tokens are removed
from the database, the synthetic number could be adjusted. It seems (and I
don't have time to think about it now, have to go pay the dog license!) that
such a scheme would work quite well along with the "remove old tokens" scheme
that ages unused tokens? It probably doesn't matter if the number is
accurate, provided the DB doesn't contain far too few tokens.
On Wednesday 17 December 2003 11:00 am, Seth Goodman wrote:
> [Bill Yerazunis]
> > Here's a particularly cute solution I implemented in CRM114.
> > Choose the tokens to decrement randomly. REALLY randomly. Don't
> Does CRM114 use the number of trained ham and trained spam *messages* as
> variables in its probability calculation? If not, then you wouldn't expect
> that deleting infrequently used tokens would do much damage. AFAIK,
> SpamBayes uses the trained message counts in the probability calculation
> and those becomes inaccurate if you delete individual tokens.
> Seth Goodman
> Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com
> Spambots: disregard the above
> spambayes-dev mailing list
> spambayes-dev at python.org
-----BEGIN PGP SIGNATURE-----
Version: PGP 6.5.8
-----END PGP SIGNATURE-----
More information about the spambayes-dev