[spambayes-dev] Enhanced Outlook statistics display
kenny.pitt at gmail.com
Tue Dec 7 16:22:08 CET 2004
Tony Meyer <tameyer at ihug.co.nz> wrote:
> As long as we stay away from actually expressing it as a "batting average"
> and avoid baseball terminology. Those numbers are just completely confusing
> to me, and I suspect most non-Americans.
I agree. I think sticking with percentages is the way to go. I suppose
it's a reasonable analogy once it's explained, but it isn't going to
be obvious even to a baseball fanatic just from looking at the
The main thing I took from the proposal was expressing accuracy
separately for ham and spam instead of taking each as a percentage of
the total messages. Even if 99% of messages have been classified
correctly, it makes a big difference whether the remaining 1%
represents spam that made it through to the inbox vs. ham that was
removed from the inbox by mistake.
I'm still a little unsure (ok, pun intended, couldn't resist <wink>)
how to treat unsures in this. Currently I'm showing the primary
accuracy results based on the number of messages that SpamBayes
classified as either ham or spam, with a separate percentage showing
the additional messages that were classified as unsure.
Another option I considered was measuring the percentage of messages
removed from the inbox. It seems that ham and spam are somewhat
asymmetric with regards to unsures. I suspect that most people are ok
with spam being classified as unsure as long as it isn't left in their
inbox, but they would prefer not to see a ham message removed from the
inbox even if it is only moved to the unsure bin.
More information about the spambayes-dev