[spambayes-dev] RE: [Spambayes] question regarding training

Seth Goodman sethg at GoodmanAssociates.com
Thu Aug 12 16:59:45 CEST 2004


> From: Kenny Pitt
> Sent: Wednesday, August 11, 2004 10:06 AM

<...>

> Unfortunately, automatic training is not as easy as it would
> seem in the Outlook add-in.  The add-in uses a different
> trainer so that it can keep track of the Outlook id's of
> trained messages, rescore trained messages, etc.  This makes
> it difficult to call the trainer from inside the classifier
> because the classifier is shared with the other SpamBayes apps such as
> sb_server.  I'm sure there are some simple modifications that
> could be made to the code structure so that this could be
> implemented, but I just haven't found the time yet to work
> out the details.

How about something dumb and ugly?  When a user trains a message and
Spambayes sees that there is an imbalance in the training set sizes, put
up a text info box recommending that the user train on N ham (or spam)
to keep Spambayes performing well.  The user selects the messages, moves
them to the unsure folder and trains as appropriate.  I did say it was
dumb and ugly.

This would be easier if you could train on a correctly classified
message without moving it to the unsure folder.  At present, there is no
"train as good" button in the ham folder and no "train as spam" in the
spam folder.  That might be a nice addition anyway.

As you say, automating this is not easy.  There are no folders of
confirmed ham or spam in the Outlook implementation to choose among.
Using the dumb and ugly (tm) method, the additional ham or spam the user
selects to train are manually selected and are actually ham or spam.
The text box could further suggest that they train on messages that
scored furthest from perfect classification.

--

Seth Goodman



More information about the spambayes-dev mailing list