[Spambayes] Testers needed with unbalanced spam::ham training
data
Richie Hindle
richie@entrian.com
Sun Nov 17 23:07:25 2002
> [Classifier]
> experimental_ham_spam_imbalance_adjustment: True
Four runs, with and without experimental_ham_spam_imbalance_adjustment, and
with a 10:1 ham:spam imbalance either way:
lowham[_adj]: timcv.py -n10 --ham=20 --spam=200 -s1
lowspam[_adj]: timcv.py -n10 --ham=200 --spam=20 -s1
filename: lowham lowham_adj
ham:spam: 200:2000
200:2000
fp total: 15 2
fp %: 7.50 1.00
fn total: 1 1
fn %: 0.05 0.05
unsure t: 37 42
unsure %: 1.68 1.91
real cost: $158.40 $29.40
best cost: $67.20 $26.40
h mean: 17.41 8.38
h sdev: 31.13 20.20
s mean: 99.90 99.66
s sdev: 2.47 3.35
mean diff: 82.49 91.28
k: 2.46 3.88
filename: lowspam lowspam_adj
ham:spam: 2000:200
2000:200
fp total: 0 1
fp %: 0.00 0.05
fn total: 10 1
fn %: 5.00 0.50
unsure t: 35 72
unsure %: 1.59 3.27
real cost: $17.00 $25.40
best cost: $10.80 $7.00
h mean: 0.18 1.61
h sdev: 2.08 7.13
s mean: 89.39 96.69
s sdev: 23.92 10.59
mean diff: 89.21 95.08
k: 3.43 5.37
The introduced fp in lowspam_adj is a very spammy HTML email from an ISP -
it's always showed up as an fp in my corpus.
--
Richie Hindle
richie@entrian.com
More information about the Spambayes
mailing list