[spambayes-dev] RE: [Spambayes] question regarding training

Kenny Pitt kennypitt at hotmail.com
Thu Aug 12 16:29:25 CEST 2004


Switching over to the dev list...

Tony Meyer wrote:
>> If we are, in fact, talking about the Outlook add-in then it is very
>> difficult to do anything besides "train on mistakes".
> 
> Except for initial training, when the obvious (from what is
> presented, I 
> think) choice is train-on-everything.  (i.e the Wizard asks for mail
> you already have stored, and does toe on that.

Since I never run the config wizard myself, I had forgotten about that.  I
think most casual users fall into one of two camps:  either they have every
good message they've ever received still sitting in their Inbox, or they get
far more spam than ham.  Either way, initial training is likely to result in
a significant imbalance.  I doubt that most users pay any attention to how
many of each type of message are in their initial training set.  Does the
Wizard give any kind of warning during initial training if there is a
significant imbalance in the selected messages?

The config wizard seems to me to encourage initial training on existing
messages.  Since training on mistakes and unsures starting from an empty
database has proven so effective for most of us, I wonder if it would not be
better to recommend that method instead?

> It would be
> interesting - and I will try this when I get time - to get it to do
> tte instead).   

I think TTE would be an excellent choice for initial training, far better
than train on everything given the likely disparity in the number of
available messages of each type.

-- 
Kenny Pitt



More information about the spambayes-dev mailing list