[spambayes-dev] Another incremental training idea...
T. Alexander Popiel
popiel at wolfskeep.com
Tue Jan 13 18:21:45 EST 2004
In message: <MHEGIFHMACFNNIMMBACACEEMHDAA.nobody at spamcop.net>
"Seth Goodman" <nobody at spamcop.net> writes:
>
>I am fooling around with variable expiration times based on how "wrong" a
>particular message classification was to see if I can possibly keep a
>reduced training set up-to-date automatically and if that is, in fact, a
>reasonable thing to do at all. Maybe the pure train on almost everything
>regime with a long time expiration (like Alex's four months) will be the
>ultimate?
It occurs to be that we need to start being careful about how we talk
about expiry. The expiry that I've tested with the harness is based on
taking trained messages back out of the database after a certain length
of time. However, in real life usage, I'm completely rebuilding the
database every night with a 4 month horizon (and likely training on a
noticably different collection of messages each night).
My testing has shown that the former approach is worse than not expiring
at all for my data... but I haven't even attempted to rigorously test
the latter approach (and I shudder at the amount of CPU it would take
to test, too!). I'm _guessing_ that the latter approach would perform
better than the former, however.
Argh!
- Alex
More information about the spambayes-dev
mailing list