[spambayes-dev] Very small change for composite word tokenizing.

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue Aug 5 14:05:02 EDT 2003


Ok, for those interested in testing this out, there are *two* changes to
make to the code that Sean posted.  The first is to change the regex to
include '0', and the second is to yield w and not word.  Sean made these
changes and said that his positives results disappeared, but mine
didn't:

[the third and fourth columns are the old, inaccurate results, included
for reference]

filename:  august_no_seans august_no_seans
                   accurate_seans  august_seans
ham:spam:  7900:15260      7900:15260
                   7900:15260      7900:15260
fp total:        2       2       2       2
fp %:         0.03    0.03    0.03    0.03
fn total:      176     175     176     172
fn %:         1.15    1.15    1.15    1.13
unsure t:      501     495     501     499
unsure %:     2.16    2.14    2.16    2.15
real cost: $296.20 $294.00 $296.20 $291.80
best cost: $489.60 $488.80 $489.60 $488.80
h mean:       0.63    0.60    0.63    0.62
h sdev:       4.84    4.75    4.84    4.81
s mean:      94.52   94.49   94.52   94.57
s sdev:      18.67   18.70   18.67   18.56
mean diff:   93.89   93.89   93.89   93.95
k:            3.99    4.00    3.99    4.02

So my fn didn't go down nearly as much, but my unsures went down more.

=Tony Meyer



More information about the spambayes-dev mailing list