[Spambayes] Need testers!

Sjoerd Mullender sjoerd@acm.org
Mon, 16 Sep 2002 17:00:59 +0200


I've been saving all my incoming mail for just over 2 weeks now, and
tried this test on my data.  I have collected 3117 hams and 633 spams
which I divided into 4 sets of 150 messages each (with some left in
the reservoirs).

I didn't include messages to python-list-admin@python.org or
postmaster@oratrix.com in my corpus.

Here is the result of the test run with your first suggested change
(just adjust_probs_by_evidence_mass: True)

Before: adjust_probs_by_evidence_mass: False
After:  adjust_probs_by_evidence_mass: True

"""
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams

false positive percentages
    0.000  0.000  tied
    0.667  0.000  won   -100.00%
    0.000  0.000  tied
    0.667  0.667  tied

won   1 times
tied  3 times
lost  0 times

total unique fp went from 2 to 1 won    -50.00%
mean fp % went from 0.333333333334 to 0.166666666667 won    -50.00%

false negative percentages
    2.000  1.333  won    -33.35%
    0.667  1.333  lost   +99.85%
    0.667  1.333  lost   +99.85%
    2.000  2.000  tied

won   1 times
tied  1 times
lost  2 times

total unique fn went from 8 to 9 lost   +12.50%
mean fn % went from 1.33333333333 to 1.5 lost   +12.50%
"""

And here with the later suggested change
[Classifier]
adjust_probs_by_evidence_mass: True
min_spamprob: 0.001
max_spamprob: 0.999
hambias: 1.5

"""
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams

false positive percentages
    0.000  0.000  tied
    0.667  0.667  tied
    0.000  0.000  tied
    0.667  0.667  tied

won   0 times
tied  4 times
lost  0 times

total unique fp went from 2 to 2 tied
mean fp % went from 0.333333333334 to 0.333333333334 tied

false negative percentages
    2.000  1.333  won    -33.35%
    0.667  0.667  tied
    0.667  0.667  tied
    2.000  1.333  won    -33.35%

won   2 times
tied  2 times
lost  0 times

total unique fn went from 8 to 6 won    -25.00%
mean fn % went from 1.33333333333 to 0.999999999998 won    -25.00%
"""

-- Sjoerd Mullender <sjoerd@acm.org>