[spambayes-dev] Results for DNS lookup in tokenizer
Tony Meyer
tameyer at ihug.co.nz
Tue Apr 13 19:16:37 EDT 2004
> Here are my results using timcv.py -n5 with two corpora.
> First cmp.py results, then a table.py with just running with
> defaults as well.
And here are two more (they were running too slow to get out yesterday, but
completed overnight).
The first one is my non-work mail for the last few months; the second one is
the five sets that make up the SpamAssassin Public Archive (the bzip files
starting with 2003...).
Once again, the standard x-pick_apart_urls option does nothing (good or bad)
for me. The SAPC one is just a loss, and the other is a more substantial
loss (although each win with one run).
-> <stat> tested 4692 hams & 386 spams against 18762 hams & 1537 spams
-> <stat> tested 4695 hams & 381 spams against 18759 hams & 1542 spams
-> <stat> tested 4693 hams & 383 spams against 18761 hams & 1540 spams
-> <stat> tested 4690 hams & 384 spams against 18764 hams & 1539 spams
-> <stat> tested 4684 hams & 389 spams against 18770 hams & 1534 spams
-> <stat> tested 4692 hams & 386 spams against 18762 hams & 1537 spams
-> <stat> tested 4695 hams & 381 spams against 18759 hams & 1542 spams
-> <stat> tested 4693 hams & 383 spams against 18761 hams & 1540 spams
-> <stat> tested 4690 hams & 384 spams against 18764 hams & 1539 spams
-> <stat> tested 4684 hams & 389 spams against 18770 hams & 1534 spams
false positive percentages
0.000 0.000 tied
0.021 0.021 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 5 times
lost 0 times
total unique fp went from 1 to 1 tied
mean fp % went from 0.00425985090522 to 0.00425985090522 tied
false negative percentages
1.036 1.036 tied
1.050 1.575 lost +50.00%
0.783 0.522 won -33.33%
1.823 2.083 lost +14.26%
1.285 1.799 lost +40.00%
won 1 times
tied 1 times
lost 3 times
total unique fn went from 23 to 27 lost +17.39%
mean fn % went from 1.19553834481 to 1.40321699713 lost +17.37%
ham mean ham sdev
0.09 0.10 +11.11% 1.73 1.72 -0.58%
0.11 0.11 +0.00% 2.24 2.09 -6.70%
0.12 0.12 +0.00% 2.05 2.05 +0.00%
0.09 0.08 -11.11% 2.01 1.78 -11.44%
0.04 0.05 +25.00% 0.88 1.19 +35.23%
ham mean and sdev for all runs
0.09 0.09 +0.00% 1.85 1.80 -2.70%
spam mean spam sdev
95.65 95.35 -0.31% 15.15 16.13 +6.47%
95.77 95.20 -0.60% 15.18 16.83 +10.87%
97.06 96.05 -1.04% 11.42 13.61 +19.18%
95.32 94.61 -0.74% 16.75 18.41 +9.91%
95.57 95.40 -0.18% 15.57 16.05 +3.08%
spam mean and sdev for all runs
95.87 95.32 -0.57% 14.94 16.29 +9.04%
ham/spam mean difference: 95.78 95.23 -0.55
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
false positive percentages
0.241 0.241 tied
0.482 0.482 tied
0.000 0.000 tied
0.120 0.120 tied
0.000 0.000 tied
won 0 times
tied 5 times
lost 0 times
total unique fp went from 7 to 7 tied
mean fp % went from 0.168674698795 to 0.168674698795 tied
false negative percentages
0.789 1.053 lost +33.46%
0.526 0.526 tied
0.528 0.264 won -50.00%
0.264 0.264 tied
1.055 1.319 lost +25.02%
won 1 times
tied 2 times
lost 2 times
total unique fn went from 12 to 13 lost +8.33%
mean fn % went from 0.632551034579 to 0.685182613526 lost +8.32%
ham mean ham sdev
0.67 0.61 -8.96% 6.87 6.56 -4.51%
0.95 0.85 -10.53% 8.69 8.08 -7.02%
0.87 0.81 -6.90% 7.10 6.79 -4.37%
0.60 0.57 -5.00% 6.64 6.49 -2.26%
0.48 0.42 -12.50% 4.87 4.62 -5.13%
ham mean and sdev for all runs
0.71 0.65 -8.45% 6.94 6.60 -4.90%
spam mean spam sdev
97.13 96.89 -0.25% 12.08 13.00 +7.62%
98.59 98.50 -0.09% 8.09 8.49 +4.94%
98.57 98.44 -0.13% 8.03 8.15 +1.49%
98.59 98.54 -0.05% 7.51 7.68 +2.26%
97.91 97.72 -0.19% 11.50 12.22 +6.26%
spam mean and sdev for all runs
98.16 98.02 -0.14% 9.66 10.18 +5.38%
ham/spam mean difference: 97.45 97.37 -0.08
filename: ihugs ihug_picks ihug_pickms
ham:spam: 23454:1923 23454:1923 23454:1923
fp total: 1 1 1
fp %: 0.00 0.00 0.00
fn total: 23 23 27
fn %: 1.20 1.20 1.40
unsure t: 169 171 176
unsure %: 0.67 0.67 0.69
real cost: $66.80 $67.20 $72.20
best cost: $57.00 $56.60 $62.40
h mean: 0.09 0.09 0.09
h sdev: 1.89 1.85 1.80
s mean: 95.86 95.87 95.32
s sdev: 14.99 14.94 16.29
mean diff: 95.77 95.78 95.23
k: 5.67 5.70 5.26
filename: sapcs sapc_picks sapc_pickms
ham:spam: 4150:1897 4150:1897 4150:1897
fp total: 7 7 7
fp %: 0.17 0.17 0.17
fn total: 12 12 13
fn %: 0.63 0.63 0.69
unsure t: 99 99 100
unsure %: 1.64 1.64 1.65
real cost: $101.80 $101.80 $103.00
best cost: $70.60 $70.20 $70.80
h mean: 0.71 0.71 0.65
h sdev: 6.92 6.94 6.60
s mean: 98.14 98.16 98.02
s sdev: 9.72 9.66 10.18
mean diff: 97.43 97.45 97.37
k: 5.86 5.87 5.80
More information about the spambayes-dev
mailing list