[spambayes-dev] Incorrect Outlook stats

Tony Meyer tameyer at ihug.co.nz
Tue Nov 23 00:16:59 CET 2004


> I just started working on getting the extended statistics 
> into the Outlook addin, and I noticed something in the stat 
> tracking that isn't doing what I think it should.
> 
> My SpamBayes accuracy has been so good that I have had no 
> false positives or negatives since Tony added permanent 
> accumulation of the statistics.

That is good :)

> So, in order to test the 
> statistics for incorrect classifications, I trained one of my 
> good messages as spam and checked to see that it showed up as 
> a false negative.
> 
> I then trained the message back to good, but the statistics 
> still showed that I had one false negative.  It seems like 
> the correct behavior would be to erase the false negative if 
> the message is trained back to its original classification.

An odd case, to be sure :)

> I thought it would be a simple matter to check if we are 
> training back to the original classification and just 
> decrement the appropriate statistic. The non-Outlook apps 
> store the original classification in the message info db, but 
> it doesn't appear that the Outlook addin does this. Anyone 
> (Tony?) have any suggestions on how to go about fixing this?

This is similar to what I needed to do to get the original
score/classification in the "show clues" message.  In
manager.classifier_data.message_db we store the trained status, but not the
classification.  However, we do store the original score in the "Spam" field
(unless that option is turned off, or there was a problem doing so), and can
figure it out from there*.

However, when we train via the buttons we rescore the message, which changes
this field, so that data is lost.

AFAICT the only way** to fix this would be to add more information to the
message_db (a la the non-Outlook version).  I believe we can do this in a
backwards-compatible way, although there will be a reasonable number of
changes, I suspect.  Should I go ahead and do this?

* Of course, we store only the score, so if the thresholds have changed, all
bets are off.

** Well, other than adding another field to the message, or something like
that.

=Tony.Meyer



More information about the spambayes-dev mailing list