[spambayes-dev] patch to improve statistics from spambayes

Thu Feb 26 19:43:52 EST 2004

Hi.

While I'm generally very happy with Spambayes, I was a bit
confused by the statistics, which didn't seem to add up.

I'm using Spambayes 1.0a9,  the web page says
SpamBayes POP3 Proxy Version 0.4 (February 2004) on
Windows 2000 SP4.  I'm using POP3 interface (tried a couple of
different mail agents, including a command-line POP3 fetch, OE
and Mozilla mail -- I see similar results, these are with the command
line fetch).   I have 'Lookup message in cache' set to yes,
Notate to: unsure, Classify subject spam.  I suppress
caching of bulk ham.

After a POP3 fetch, the Statistics page says:

SpamBayes has processed 1150 messages - 754 (66%) good, 333 (29%) spam and 63 (5%) unsure.
324 messages were manually classified as good (0 were false positives).
379 messages were manually classified as spam (33 were false negatives).
6 unsure messages were manually identified as good, and 52 as spam.

** 1.

6 unsure good + 52 unsure spam  adds up to 58.  But the processed
line says 63?  It's not clear how many messages were manually 
reviewed/trained.

** 2.

It's not clear that manually classified as good helps figure out what
was accurately classified as good, because that includes ham, spam
and unsures that were so classified.  Ditto for spam.

It's not clear how the 324 manually classified
as good relate to the 754 good, and the 379 manually classified as
spam relate to the 333 spam?  And as a result, it's hard to estimate
accuracy.

** 3.

After using the Review web page to train and mark all 4 unsure as spam, 
2 ham as spam and leaving all spam as-is (yay!), I see:

SpamBayes has processed 1150 messages - 754 (66%) good, 333 (29%) spam and 63 (5%) unsure.
333 messages were manually classified as good (0 were false positives).
414 messages were manually classified as spam (35 were false negatives).
6 unsure messages were manually identified as good, and 56 as spam.

The false positive count is clearly a bug, since I just classified
2 ham as spam, and I know I've done that often.  But I've never
had to classify spam as ham.  Looks like fp & fn are inverted.

The enclosed patch fixes that inversion, adds a few counters
to tell which ham was manually identifed as spam and vice
versa, as well as total ham/spam/manually reviewed, so
one can calculate percentages.  (The calculation is conservative;
false positives/manually-reviewed ham, or false negatives/manually-reviewed spam,
so that unreviewed messages don't skew the percentages)
Also trimmed the statements somewhat to avoid over-long lines.
(removed some verbs:-)

Before the enclosed patch, Stats.py produces:
SpamBayes has processed 1223 messages - 827 (68%) good, 333 (27%) spam and 63 (5%) unsure.
346 messages were manually classified as good (0 were false positives).
414 messages were manually classified as spam (35 were false negatives).
6 unsure messages were manually identified as good, and 56 as spam. 

With the patch, Stats.py produces:
Classified 1223 messages - 827 (68%) ham, 333 (27%) spam and 63 (5%) unsure.
Manually trained 760 messages:
340 of 375 ham messages manually confirmed (35 false positives 4.2%).
323 of 323 spam messages manually confirmed (0 false negatives 0.0%).
Of 62 unsure messages, 6 (9.7%) manually identified as ham, 56 (90.3%) as spam.

I find this much more useful -- hope you agree.

Regards,
    Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIFF-spambayes10a9-stats
Type: application/octet-stream
Size: 6661 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040226/e059599b/DIFF-spambayes10a9-stats-0001.obj