[Spambayes-checkins] spambayes tokenizer.py,1.7,1.8

Tim Peters tim_one@users.sourceforge.net
Sun, 08 Sep 2002 14:29:07 -0700


Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv10235

Modified Files:
	tokenizer.py 
Log Message:
Fixed grammar in a comment, just because I forgot to post the new rates
after the last checkin (to simplify parsing of email addresses):

false positive percentages
    0.000  0.000  tied
    0.000  0.000  tied
    0.050  0.100  lost  +100.00%
    0.025  0.025  tied
    0.075  0.050  won    -33.33%
    0.000  0.000  tied
    0.075  0.075  tied
    0.075  0.050  won    -33.33%
    0.025  0.025  tied
    0.025  0.025  tied
    0.050  0.050  tied
    0.050  0.050  tied
    0.050  0.050  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.075  0.075  tied
    0.025  0.025  tied
    0.000  0.000  tied
    0.025  0.025  tied
    0.050  0.100  lost  +100.00%

won   2 times
tied 16 times
lost  2 times

total unique fp went from 12 to 14 lost   +16.67%

false negative percentages
    0.327  0.291  won    -11.01%
    0.400  0.364  won     -9.00%
    0.364  0.254  won    -30.22%
    0.691  0.582  won    -15.77%
    0.545  0.545  tied
    0.291  0.218  won    -25.09%
    0.291  0.218  won    -25.09%
    0.618  0.654  lost    +5.83%
    0.436  0.364  won    -16.51%
    0.327  0.255  won    -22.02%
    0.364  0.400  lost    +9.89%
    0.691  0.654  won     -5.35%
    0.618  0.618  tied
    0.291  0.291  tied
    0.291  0.291  tied
    0.436  0.436  tied
    0.473  0.436  won     -7.82%
    0.218  0.218  tied
    0.291  0.255  won    -12.37%
    0.254  0.182  won    -28.35%

won  12 times
tied  6 times
lost  2 times

total unique fn went from 110 to 101 won     -8.18%


Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** tokenizer.py	8 Sep 2002 21:08:16 -0000	1.7
--- tokenizer.py	8 Sep 2002 21:29:05 -0000	1.8
***************
*** 604,609 ****
  #    all the charsets
  #
! # This has huge benefit for the f-n rate, and virtually none on the f-p rate,
! # although it does reduce the variance of the f-p rate across different
  # training sets (really marginal msgs, like a brief HTML msg saying just
  # "unsubscribe me", are almost always tagged as spam now; before they were
--- 604,609 ----
  #    all the charsets
  #
! # This has huge benefit for the f-n rate, and virtually no effect on the f-p
! # rate, although it does reduce the variance of the f-p rate across different
  # training sets (really marginal msgs, like a brief HTML msg saying just
  # "unsubscribe me", are almost always tagged as spam now; before they were