[Spambayes] More back-patting - my brain's first FP where bayes
got it right
Rob Hooft
rob@hooft.net
Tue Nov 19 22:41:14 2002
Skip Montanaro wrote:
> Tim> [Toby Dickenson]
> >> Why exclude spams that score 100 from training? Even these really
> >> spammy spams might contain clues that would help to classify other
> >> more marginal spam.
>
> Tim> Absolutely, but that's a different experiment. I've already done
> Tim> "proper" training and know it works great for me. These are
> Tim> experiments in doing silly training.
>
> If you're taking notes on this in various files in CVS I wouldn't call it
> "silly training". How about "realistic training"?
Why realistic? Minimalistic?
I've seen my favorite being discussed, but I'd like to see more
statistics on it: only train on all ham/spam messages automatically
without any user interaction after an initial training phase of
minimally 10-30 messages. This should automatically adapt to gradual
changes. If this would really work, it would be my realistic variant...
Integration into the MUA could only make it better.
Hm. I just adapted weaktest to be a bit more flexible, such that all
these strategies can be tested. There are four new flags to the
weaktest.py program:
-d <key>: selects the "decisionmaker"; i.e. the strategy used to decide
whether a message is trained on. There is a choice between:
all : train on all messages
allbut0and100 : train on all spam < 0.995 and ham >0.005
unsureandfalses : train on Unsure and fp/fn only
unsureonly : train on Unsure only.
-u <key>: selects the "update strategy".
always : updates counts after every trained message
sometimes : trains every 10th
-m int : uses the first "int" messages for training only (default 10)
-v : increases verbosity.
I'm open to ideas (and results).
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/
More information about the Spambayes
mailing list