[Spambayes] feature request

Seth Goodman nobody at spamcop.net
Sun Dec 7 17:51:45 EST 2003


Here's an interesting feature request that comes out experimenting with
training schemes.  When you are in the unsure folder and hit either of the
buttons "Delete As Spam" or "Recover from Spam", it would be great if you
would re-filter the unsure folder.  In fact, I would argue that the unsure
folder should be re-classified after any training event.  If you want to
avoid unnecessary "overtraining" (training on messages whose tokens are
already represented in sufficient number in the correct database), one good
practice it to manually re-filter the unsure box after each additional
message that you train on.  Frequently, training one unsure message as spam
will push the scores of other messages in the unsure folder well into the
spam range, making it unnecessary to train on them.  Since we probably don't
remember to do this all the time (I sure don't), we wind up training on
messages that would now classify properly, thus increasing the size of the
(usually spam) database unnecessarily.  Since many knowledgeable people on
this list say that smaller databases seem to be better, which is reasonable,
this feature would be an aid to extending the useful life of any particular
training set.

If you really want to make it slick, after re-filtering the unsure folder,
move any messages out of that folder that re-classify as definite spam or
ham.  Ideally, these would be a user selectable parameters, but I feel both
features would be excellent default behavior.

One more note on the unsure folder is that one of the buttons is labeled
"Recover from Spam".  Since none of the messages in the unsure folder have
been trained as spam, the "Recover from Spam" button is a bit misleading.
Though this is the same button that appears in spam folders, thus making the
code simpler, in the unsure folder it should probably be called "Train as
Good" or "Keep as Good".

--
Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above




More information about the Spambayes mailing list