[Spambayes-checkins] spambayes NEWTRICKS.txt,1.6,1.7
Tony Meyer
anadelonbrin at users.sourceforge.net
Sun Dec 28 20:01:50 EST 2003
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv11595
Modified Files:
NEWTRICKS.txt
Log Message:
Add comments from Skip on spambayes-dev.
Index: NEWTRICKS.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/NEWTRICKS.txt,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** NEWTRICKS.txt 13 Oct 2003 21:44:27 -0000 1.6
--- NEWTRICKS.txt 29 Dec 2003 01:01:46 -0000 1.7
***************
*** 30,33 ****
--- 30,51 ----
helpful to try stripping punctuation. (Idea from Paul Sorenson)
+ [skip] I tried the first (eliding punctuation from words). From a testing
+ standpoint it turns out to not be all that useful, I think for a couple
+ reasons:
+
+ * There are plenty of other spammy clues in such messages which are
+ sufficient to kick these messages into spam range. Most of this stuff
+ winds up scoring at 0.95 or above for me. If they don't score as spam
+ for you, train on a few and see how it does then.
+
+ * Training databases full of old-ish mail won't contain many of these
+ sorts of messages, so enabling punctuation removal won't change things
+ very much.
+
+ [tony] I tried Skip's patch and got basically the same results, and
+ his reasoning above sounds right for my experience, too. OTOH, I am
+ getting more of these messages now, so my corpus is changing (they're
+ still classified as spam without this, though).
+
- Similarly, some letters get replaced by numbers, e.g.: "V1agra" instead of
"Viagra". Mapping numbers to suitable letters might help in some
More information about the Spambayes-checkins
mailing list