[spambayes-dev] Another incremental training idea...

T. Alexander Popiel popiel at wolfskeep.com
Tue Jan 13 19:49:10 EST 2004

In message:  <MHEGIFHMACFNNIMMBACAOEFDHDAA.nobody at spamcop.net>
             "Seth Goodman" <nobody at spamcop.net> writes:
>[Seth Goodman]
>> I guess I don't understand why the two expiry approaches should be
>> different ...
>Oooops, sorry for the dumb question.   The result from creating a new
>training set every night does depend very much on how you score the four
>months of messages to determine which ones are nonedge, though.  How do you
>do this in your regime?

In my real life usage, the rebuilding process scores messages based
on the database it's built so far.  Having it score messages for the
new database based on the old database might be another interesting
scenario, but I have absolutely no clue how effective that would be.

In any case, these complete-rebuilding scenarios are predicated on
keeping at least the token list for every single message (not just
those that have been trained) for as long as we might want to train
on them.  There is some indication that a significant portion of the
userbase is unwilling to keep that much mail data lying around
(for months, presumably)... which makes the other form of expiry
of more practical interest.

- Alex

More information about the spambayes-dev mailing list