[spambayes-dev] Very small change for composite word tokenizing.

T. Alexander Popiel popiel at wolfskeep.com
Tue Aug 5 21:24:41 EDT 2003


In message:  <00b201c35ac2$7ed88460$0201a8c0 at swapwizard.com>
             "Sean True" <seant at webreply.com> writes:
>
>Not exactly a patch, but it's a one minute cut and paste. I'm theorizing
>that the memory hit is not horrendous -- mostly generates sensible
>fragments
>www.microsoft.com -> www, microsoft, com
>Very_naughty_bits -> very, naughty, bits

With the two fixes mentioned earlier, here's my results on 48 days of
data...

filename:  fragment       
                    normal
ham:spam:  1978:6166      
                   1978:6166
fp total:        1       1
fp %:         0.05    0.05
fn total:       28      25
fn %:         0.45    0.41
unsure t:      172     152
unsure %:     2.11    1.87
real cost:  $72.40  $65.40
best cost:  $44.20  $41.80
h mean:       0.25    0.27
h sdev:       3.71    3.80
s mean:      98.51   98.66
s sdev:       8.97    8.56
mean diff:   98.26   98.39
k:            7.75    7.96


In other words, for me it's a significant loss.

- Alex



More information about the spambayes-dev mailing list