[Spambayes] training suggestions
skip at pobox.com
skip at pobox.com
Thu Aug 3 22:31:56 CEST 2006
Dhaval> So now I am confused what the -f options is for. I dont use it
Dhaval> because I dont want to retrain everything, just the ones that
Dhaval> have NOT been trained yet. Am I wrong in assuming this?
My fault. Yes, if you want to do incremental training with sb_mboxtrain,
then leave off the -f flag. The fact that you are training once a day threw
me though. If incremental training is what you want to do, why not run
sb_mboxtrain more frequently than daily?
Dhaval> Now that I think about it, the best thing would be to do the
Dhaval> 1. message comes in to the filter
Dhaval> 2. filter sorts it as ham or spam
Dhaval> 3. db is trained with this message ham or spam (just this one
Dhaval> message) when the user sorts messages (put spam from inbox ->
Dhaval> spam folder)
Dhaval> 4. the changes to the db made in step 3 get undone and the
Dhaval> message gets trained as spam (similarly if the user moved
Dhaval> form spam folder -> inbox)
Dhaval> Can spambayes do this? Can I specify just a message id or a list
Dhaval> of message ids? ( I use maildir format mail storage)
I'm not sure. Again, that's not the way I work. Nothing is ever trained
automatically in my personal environment. If I see a message that is
misclassified (either false negative, unsure or false positive) then I toss
it into the appropriate training database. I never train on a message which
was properly classified. I rerun tte sort of when I feel like it, maybe a
few times a day, especially if I'm actively fiddling with things as I am now
or if I've tossed out my training database together and am starting from
Note also that I have the luxury of having a user population of one person.
Sounds like you aren't so fortunate.
More information about the SpamBayes