[Spambayes] Need testers!
Sjoerd Mullender
sjoerd@acm.org
Mon, 16 Sep 2002 17:00:59 +0200
I've been saving all my incoming mail for just over 2 weeks now, and
tried this test on my data. I have collected 3117 hams and 633 spams
which I divided into 4 sets of 150 messages each (with some left in
the reservoirs).
I didn't include messages to python-list-admin@python.org or
postmaster@oratrix.com in my corpus.
Here is the result of the test run with your first suggested change
(just adjust_probs_by_evidence_mass: True)
Before: adjust_probs_by_evidence_mass: False
After: adjust_probs_by_evidence_mass: True
"""
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
false positive percentages
0.000 0.000 tied
0.667 0.000 won -100.00%
0.000 0.000 tied
0.667 0.667 tied
won 1 times
tied 3 times
lost 0 times
total unique fp went from 2 to 1 won -50.00%
mean fp % went from 0.333333333334 to 0.166666666667 won -50.00%
false negative percentages
2.000 1.333 won -33.35%
0.667 1.333 lost +99.85%
0.667 1.333 lost +99.85%
2.000 2.000 tied
won 1 times
tied 1 times
lost 2 times
total unique fn went from 8 to 9 lost +12.50%
mean fn % went from 1.33333333333 to 1.5 lost +12.50%
"""
And here with the later suggested change
[Classifier]
adjust_probs_by_evidence_mass: True
min_spamprob: 0.001
max_spamprob: 0.999
hambias: 1.5
"""
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
-> <stat> tested 150 hams & 150 spams against 450 hams & 450 spams
false positive percentages
0.000 0.000 tied
0.667 0.667 tied
0.000 0.000 tied
0.667 0.667 tied
won 0 times
tied 4 times
lost 0 times
total unique fp went from 2 to 2 tied
mean fp % went from 0.333333333334 to 0.333333333334 tied
false negative percentages
2.000 1.333 won -33.35%
0.667 0.667 tied
0.667 0.667 tied
2.000 1.333 won -33.35%
won 2 times
tied 2 times
lost 0 times
total unique fn went from 8 to 6 won -25.00%
mean fn % went from 1.33333333333 to 0.999999999998 won -25.00%
"""
-- Sjoerd Mullender <sjoerd@acm.org>