[spambayes-dev] More stats talk (warning, long)
tim.one at comcast.net
Fri Sep 12 14:56:38 EDT 2003
> Conceptually, every email is either ham or spam.
Not so in our system. About half the Unsures I get I throw away without
training on, because after 30 seconds of staring at them I simply can't
decide whether they're "really" ham or spam. That used to bother me a year
ago, but doesn't anymore. If we were to classify all spambayes users as
either "fat" or "skinny", *some* of them would get the point quicker <wink>.
> A false positive occurs when a ham is categorized as spam, and a false
> negative occurs when a spam is categorized as ham.
That much is non-controversial.
> I'm not too sure how to fit "unsures" into this scheme.
Three categories don't fit into two, of course. The Unsure rate (% of email
classified as unsure) is an interesting stat in its own right. The
percentages of initial Unsures later trained as ham, trained as spam, and
never trained, are also interesting. Pie charts come to mind.
More information about the spambayes-dev