[spambayes-dev] Very small change for composite word tokenizing
Meyer, Tony
T.A.Meyer at massey.ac.nz
Thu Aug 7 13:05:49 EDT 2003
And, FWIW, here are results on a different corpus:
filename: no_sean2s kenny2s
sean2s
ham:spam: 7580:7580 7580:7580
7580:7580
fp total: 44 47 45
fp %: 0.58 0.62 0.59
fn total: 16 17 17
fn %: 0.21 0.22 0.22
unsure t: 356 348 344
unsure %: 2.35 2.30 2.27
real cost: $527.20 $556.60 $535.80
best cost: $592.40 $611.00 $584.40
h mean: 3.40 3.38 3.36
h sdev: 14.19 14.21 14.14
s mean: 97.94 97.95 97.98
s sdev: 9.43 9.49 9.40
mean diff: 94.54 94.57 94.62
k: 4.00 3.99 4.02
Kenny's version again does better than Sean's original, although still 1
FN and 1 FP more than not having it at all, in exchange for 12 fewer
unsures. (I think I would rather have the unsures).
=Tony Meyer
More information about the spambayes-dev
mailing list