[spambayes-dev] Incorrect Outlook stats
tameyer at ihug.co.nz
Tue Nov 23 00:16:59 CET 2004
> I just started working on getting the extended statistics
> into the Outlook addin, and I noticed something in the stat
> tracking that isn't doing what I think it should.
> My SpamBayes accuracy has been so good that I have had no
> false positives or negatives since Tony added permanent
> accumulation of the statistics.
That is good :)
> So, in order to test the
> statistics for incorrect classifications, I trained one of my
> good messages as spam and checked to see that it showed up as
> a false negative.
> I then trained the message back to good, but the statistics
> still showed that I had one false negative. It seems like
> the correct behavior would be to erase the false negative if
> the message is trained back to its original classification.
An odd case, to be sure :)
> I thought it would be a simple matter to check if we are
> training back to the original classification and just
> decrement the appropriate statistic. The non-Outlook apps
> store the original classification in the message info db, but
> it doesn't appear that the Outlook addin does this. Anyone
> (Tony?) have any suggestions on how to go about fixing this?
This is similar to what I needed to do to get the original
score/classification in the "show clues" message. In
manager.classifier_data.message_db we store the trained status, but not the
classification. However, we do store the original score in the "Spam" field
(unless that option is turned off, or there was a problem doing so), and can
figure it out from there*.
However, when we train via the buttons we rescore the message, which changes
this field, so that data is lost.
AFAICT the only way** to fix this would be to add more information to the
message_db (a la the non-Outlook version). I believe we can do this in a
backwards-compatible way, although there will be a reasonable number of
changes, I suspect. Should I go ahead and do this?
* Of course, we store only the score, so if the thresholds have changed, all
bets are off.
** Well, other than adding another field to the message, or something like
More information about the spambayes-dev