[Spambayes] training suggestions

skip at pobox.com skip at pobox.com
Thu Aug 3 22:31:56 CEST 2006


    Dhaval> So now I am confused what the -f options is for. I dont use it
    Dhaval> because I dont want to retrain everything, just the ones that
    Dhaval> have NOT been trained yet. Am I wrong in assuming this?

My fault.  Yes, if you want to do incremental training with sb_mboxtrain,
then leave off the -f flag.  The fact that you are training once a day threw
me though.  If incremental training is what you want to do, why not run
sb_mboxtrain more frequently than daily?

    Dhaval> Now that I think about it, the best thing would be to do the
    Dhaval> following:

    Dhaval> 1. message comes in to the filter
    Dhaval> 2. filter sorts it as ham or spam
    Dhaval> 3. db is trained with this message ham or spam (just this one
    Dhaval>    message) when the user sorts messages (put spam from inbox ->
    Dhaval>    spam folder)
    Dhaval> 4. the changes to the db made in step 3 get undone and the
    Dhaval>    message gets trained as spam (similarly if the user moved
    Dhaval>    form spam folder -> inbox)

    Dhaval> Can spambayes do this? Can I specify just a message id or a list
    Dhaval> of message ids? ( I use maildir format mail storage)

I'm not sure.  Again, that's not the way I work.  Nothing is ever trained
automatically in my personal environment.  If I see a message that is
misclassified (either false negative, unsure or false positive) then I toss
it into the appropriate training database.  I never train on a message which
was properly classified.  I rerun tte sort of when I feel like it, maybe a
few times a day, especially if I'm actively fiddling with things as I am now
or if I've tossed out my training database together and am starting from
scratch.

Note also that I have the luxury of having a user population of one person.
Sounds like you aren't so fortunate.

Skip


More information about the SpamBayes mailing list