[Spambayes] More back-patting - my brain's first FP where bayes got it right

Tue Nov 19 22:41:14 2002

Skip Montanaro wrote:
>     Tim> [Toby Dickenson]
>     >> Why exclude spams that score 100 from training?  Even these really
>     >> spammy spams might contain clues that would help to classify other
>     >> more marginal spam.
> 
>     Tim> Absolutely, but that's a different experiment.  I've already done
>     Tim> "proper" training and know it works great for me.  These are
>     Tim> experiments in doing silly training.  
> 
> If you're taking notes on this in various files in CVS I wouldn't call it
> "silly training".  How about "realistic training"?

Why realistic? Minimalistic?

I've seen my favorite being discussed, but I'd like to see more 
statistics on it: only train on all ham/spam messages automatically 
without any user interaction after an initial training phase of 
minimally 10-30 messages. This should automatically adapt to gradual 
changes. If this would really work, it would be my realistic variant... 
Integration into the MUA could only make it better.

Hm. I just adapted weaktest to be a bit more flexible, such that all 
these strategies can be tested. There are four new flags to the 
weaktest.py program:

  -d <key>: selects the "decisionmaker"; i.e. the strategy used to decide
       whether a message is trained on. There is a choice between:
        all : train on all messages
        allbut0and100 : train on all spam < 0.995 and ham >0.005
        unsureandfalses : train on Unsure and fp/fn only
        unsureonly : train on Unsure only.
  -u <key>: selects the "update strategy".
        always : updates counts after every trained message
        sometimes : trains every 10th
  -m int : uses the first "int" messages for training only (default 10)
  -v : increases verbosity.

I'm open to ideas (and results).

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/