[spambayes-dev] Another incremental training idea...
Toby Dickenson
tdickenson at geminidataloggers.com
Thu Jan 15 06:48:26 EST 2004
On Thursday 15 January 2004 03:05, Tim Peters wrote:
> [Skip Montanaro]
>
> > ...
> > It does seem a bit arbitrary, but the system seems to suggest
> > we need to be slaves to balance and that's one way to get it.
>
> Cross validation testing is measuring random-time-order TOE performance,
> and we know imbalance hurts that.
Ive finally got the cross validation tools working here, and the first thing I
looked at was imbalance. My normal training set is currently 14k hams and 2k
spams. This test compared that imbalance against three independantly selected
balanced sets with 2k of both.
If Im reading this right, my 7:1 imbalance doesnt hurt me.
filename: unbal bal1 bal2 bal3
ham:spam: 14560:1992 1992:1992
1992:1992 1992:1992
fp total: 0 0 1 0
fp %: 0.00 0.00 0.05 0.00
fn total: 12 6 8 6
fn %: 0.60 0.30 0.40 0.30
unsure t: 102 21 23 29
unsure %: 0.62 0.53 0.58 0.73
real cost: $32.40 $10.20 $22.60 $11.80
best cost: $27.60 $7.00 $9.80 $8.60
h mean: 0.11 0.23 0.30 0.32
h sdev: 1.89 2.47 3.46 3.26
s mean: 96.93 99.06 99.04 99.02
s sdev: 12.11 6.88 6.98 7.21
mean diff: 96.82 98.83 98.74 98.70
k: 6.92 10.57 9.46 9.43
--
Toby Dickenson
More information about the spambayes-dev
mailing list