[Spambayes] filtering in the face of disk quotas or full disks
bill at parducci.net
Sat Mar 22 06:44:35 EST 2003
this issue, in combination with some of the manual processes posted to the list to maintain db size and relevancy has made me wonder if spambayes shouldn't incorporate the ability to FIFO token/training info.
it seems that the most straightforward way to do this would be to time stamp each entry into the db and then have a configurable param indicating how long the db should keep information before pruning it (ostensibly during the training process).
this would fundamentally increase the size of the db in order to store this info, but should make it much more predictable in terms of size. given the results of some of the notes that i have seen on the list, it seems that mail more than a couple of months old doesn't add to the accuracy of the system (and in some cases can decrease it) so i don't see this as a detriment to the system's behavior (as long as the data life span is reasonable).
just thinking out loud, but this seems like a move forward in creating a 'set & forget' system.
More information about the Spambayes