[Spambayes-checkins] spambayes NEWTRICKS.txt,1.6,1.7

Tony Meyer anadelonbrin at users.sourceforge.net
Sun Dec 28 20:01:50 EST 2003


Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv11595

Modified Files:
	NEWTRICKS.txt 
Log Message:
Add comments from Skip on spambayes-dev.


Index: NEWTRICKS.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/NEWTRICKS.txt,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** NEWTRICKS.txt	13 Oct 2003 21:44:27 -0000	1.6
--- NEWTRICKS.txt	29 Dec 2003 01:01:46 -0000	1.7
***************
*** 30,33 ****
--- 30,51 ----
    helpful to try stripping punctuation.  (Idea from Paul Sorenson)
  
+   [skip] I tried the first (eliding punctuation from words).  From a testing
+   standpoint it turns out to not be all that useful, I think for a couple
+   reasons:
+ 
+   * There are plenty of other spammy clues in such messages which are
+     sufficient to kick these messages into spam range.  Most of this stuff
+     winds up scoring at 0.95 or above for me.  If they don't score as spam
+     for you, train on a few and see how it does then.
+ 
+   * Training databases full of old-ish mail won't contain many of these
+     sorts of messages, so enabling punctuation removal won't change things
+     very much.
+ 
+   [tony] I tried Skip's patch and got basically the same results, and
+   his reasoning above sounds right for my experience, too.  OTOH, I am
+   getting more of these messages now, so my corpus is changing (they're
+   still classified as spam without this, though).
+  
  - Similarly, some letters get replaced by numbers, e.g.: "V1agra" instead of
    "Viagra".  Mapping numbers to suitable letters might help in some





More information about the Spambayes-checkins mailing list