[Spambayes-checkins] spambayes tokenizer.py,1.7,1.8
Tim Peters
tim_one@users.sourceforge.net
Sun, 08 Sep 2002 14:29:07 -0700
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv10235
Modified Files:
tokenizer.py
Log Message:
Fixed grammar in a comment, just because I forgot to post the new rates
after the last checkin (to simplify parsing of email addresses):
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.050 0.100 lost +100.00%
0.025 0.025 tied
0.075 0.050 won -33.33%
0.000 0.000 tied
0.075 0.075 tied
0.075 0.050 won -33.33%
0.025 0.025 tied
0.025 0.025 tied
0.050 0.050 tied
0.050 0.050 tied
0.050 0.050 tied
0.000 0.000 tied
0.000 0.000 tied
0.075 0.075 tied
0.025 0.025 tied
0.000 0.000 tied
0.025 0.025 tied
0.050 0.100 lost +100.00%
won 2 times
tied 16 times
lost 2 times
total unique fp went from 12 to 14 lost +16.67%
false negative percentages
0.327 0.291 won -11.01%
0.400 0.364 won -9.00%
0.364 0.254 won -30.22%
0.691 0.582 won -15.77%
0.545 0.545 tied
0.291 0.218 won -25.09%
0.291 0.218 won -25.09%
0.618 0.654 lost +5.83%
0.436 0.364 won -16.51%
0.327 0.255 won -22.02%
0.364 0.400 lost +9.89%
0.691 0.654 won -5.35%
0.618 0.618 tied
0.291 0.291 tied
0.291 0.291 tied
0.436 0.436 tied
0.473 0.436 won -7.82%
0.218 0.218 tied
0.291 0.255 won -12.37%
0.254 0.182 won -28.35%
won 12 times
tied 6 times
lost 2 times
total unique fn went from 110 to 101 won -8.18%
Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** tokenizer.py 8 Sep 2002 21:08:16 -0000 1.7
--- tokenizer.py 8 Sep 2002 21:29:05 -0000 1.8
***************
*** 604,609 ****
# all the charsets
#
! # This has huge benefit for the f-n rate, and virtually none on the f-p rate,
! # although it does reduce the variance of the f-p rate across different
# training sets (really marginal msgs, like a brief HTML msg saying just
# "unsubscribe me", are almost always tagged as spam now; before they were
--- 604,609 ----
# all the charsets
#
! # This has huge benefit for the f-n rate, and virtually no effect on the f-p
! # rate, although it does reduce the variance of the f-p rate across different
# training sets (really marginal msgs, like a brief HTML msg saying just
# "unsubscribe me", are almost always tagged as spam now; before they were