[spambayes-dev] Another piece of anecdotal evidence

Wed Jan 14 14:27:53 EST 2004

In message:  <16389.38097.307314.458183 at montanaro.dyndns.org>
             Skip Montanaro <skip at pobox.com> writes:
>
>    >> How do you plan to find those mistrained messages?
>
>    Alex> As part of my nightly retrain, I'm going to make it score each
>    Alex> message (with the fully trained DB) and sort them into 6
>    Alex> directories for each month: {ham,spam}{positive,unsure,negative}.
>    Alex> Flipping through the hampositive directory for each month should
>    Alex> make it fairly easy to spot the problems...
>
>I'm still confused.  You've got a spam mistrained as ham.  Are you
>suggesting that you expect that scoring that message against your training
>database (which includes features gleaned from that message) will reveal
>that it is something other than ham?

Yes, actually, I do.  It certainly has a bunch of spams that it
trains on that are still classified as ham (all the false negatives
that I'm continuing to see), so I'm expecting the reverse is true, too.

At worst, I'll have to make it pay attention to the differences
between the score it uses during training to determine edge-ness
and the score determined after training and the manually determined
ham/spam state.  That'll point out the cases where messages were
not in the expected category to start with and/or training didn't
help...

- Alex (who has great faith in the system pointing out errors)