[spambayes-dev] Very small change for composite word tokenizing

Meyer, Tony T.A.Meyer at massey.ac.nz
Thu Aug 7 13:05:49 EDT 2003


And, FWIW, here are results on a different corpus:

filename:  no_sean2s        kenny2s
                    sean2s
ham:spam:  7580:7580       7580:7580
                   7580:7580
fp total:       44      47      45
fp %:         0.58    0.62    0.59
fn total:       16      17      17
fn %:         0.21    0.22    0.22
unsure t:      356     348     344
unsure %:     2.35    2.30    2.27
real cost: $527.20 $556.60 $535.80
best cost: $592.40 $611.00 $584.40
h mean:       3.40    3.38    3.36
h sdev:      14.19   14.21   14.14
s mean:      97.94   97.95   97.98
s sdev:       9.43    9.49    9.40
mean diff:   94.54   94.57   94.62
k:            4.00    3.99    4.02

Kenny's version again does better than Sean's original, although still 1
FN and 1 FP more than not having it at all, in exchange for 12 fewer
unsures.  (I think I would rather have the unsures).

=Tony Meyer



More information about the spambayes-dev mailing list