[Spambayes] Testers needed with unbalanced spam::ham training data

Sjoerd Mullender sjoerd@acm.org
Mon Nov 18 11:29:32 2002


On Sun, Nov 17 2002 Tim Peters wrote:

> If you have a strong imbalance between the # of ham and # of spam in your
> training data (or even if you don't but can spare the effort), please do a
> before-and-after test, where after adds the new option:
> 
> [Classifier]
> experimental_ham_spam_imbalance_adjustment: True
> 
> I expect this option to go away and become the default, but it needs testing
> first before I'll do that.

It doesn't look like a win for me:

cv1 is all default, cv2 is with
experimental_ham_spam_imbalance_adjustment: True

filename:      cv1     cv2
ham:spam:  14600:4000
                   14600:4000
fp total:        8      16
fp %:         0.05    0.11
fn total:        3       3
fn %:         0.07    0.07
unsure t:       97     108
unsure %:     0.52    0.58
real cost: $102.40 $184.60
best cost:  $43.60 $137.80
h mean:       0.24    0.40
h sdev:       3.64    4.80
s mean:      99.44   99.65
s sdev:       5.00    3.92
mean diff:   99.20   99.25
k:           11.48   11.38

-- Sjoerd Mullender <sjoerd@acm.org>



More information about the Spambayes mailing list