[Spambayes] possible feature request: ham training from Unsurefolder

Tue Nov 4 15:32:51 EST 2003

Seth Goodman wrote:
> The following is my guess at how the Unsure and Spam folders work,
> and if this is correct, I have a related feature request.  As I am
> new to SpamBayes, I welcome your corrections and explanations.
> 
> 1) If the message spam score is less than the ham threshold, the
> message is left in the watched folder.  No training is done with the
> message.  If the user then highlights that message and hits the
> "Delete As Spam" button, the message is moved to the Spam folder and
> it is trained on as spam. 

Yes.

> 2) If the message spam score is greater than the spam threshold, the
> message is moved from the watched folder into the Spam folder.  No
> training is done with the message.  If the user then highlights that
> message and hits "Recover from spam" button, the message is moved
> back to its' original watched folder (not necessarily the Inbox) and
> it is trained on as ham. 

Still good so far, although dragging the message back to a watched
non-spam folder also has the same effect.

> 3) If the message spam score is between the ham and spam thresholds,
> the message is moved from the watched folder to the Unsure folder. 
> No training is done with the message.  If the user then highlights
> that message and hits "Delete as spam" button, the message is moved
> to the Spam folder and it is trained on as spam.  If the user
> manually moves the message to a non-Spam folder, no training is done
> with the message. 

Almost.  If the user clicks "Delete As Spam" or drags to the spam folder
then the message is trained as spam.  If the user clicks "Recover From
Spam" or drags to a watched non-spam folder then the message is trained
as good.

> If this is correct, I think it exposes a minor weakness.  This is
> based on the premise that SpamBayes should only train on messages
> that the system cannot already classify correctly.  This assumes that
> the reason for not training on all messages is, as other folks have
> pointed out, that classification accuracy suffers when the training
> corpus is too large (around 10K messages).  If these assumptions are
> correct, then why not train on *all* messages in the Unsure folder
> when the user manually classifies them?  The resulting feature
> request would be that when the Unsure folder is selected, SpamBayes
> should display *two* classification buttons:  "Delete As Spam" and
> "Keep As Good".  The "Keep As Good" button would work exactly like
> the "Recover from spam" button in the Spam folder, that is, move the
> message back to its' original watched folder and train on the message
> as ham.  

This is exactly what *should* be happening when you select your Unsure
folder.  You should see both the "Delete As Spam" and "Recover From
Spam" buttons.  If this isn't what you see, could you possibly send a
copy of your spambayes1.log file and maybe some screen caps of your
toolbar and your filtering configuration in SpamBayes Manager to the
list?

-- 
Kenny Pitt