[Spambayes-checkins] website faq.txt,1.1,1.2

Skip Montanaro montanaro at users.sourceforge.net
Mon Jun 2 08:40:51 EDT 2003


Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1:/tmp/cvs-serv29738

Modified Files:
	faq.txt 
Log Message:
added a new q&a from Bill Parducci about how long to keep training data.


Index: faq.txt
===================================================================
RCS file: /cvsroot/spambayes/website/faq.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** faq.txt	31 May 2003 01:32:46 -0000	1.1
--- faq.txt	2 Jun 2003 14:40:48 -0000	1.2
***************
*** 432,435 ****
--- 432,452 ----
  
  
+ Do I need to keep spam after it has been trained? If so, for how long?
+ ----------------------------------------------------------------------
+ 
+ Once a message has been [correctly] trained there is no need to keep it
+ around. However, Spambayes' accuracy is dependent upon having a "sufficient"
+ sample from which to make its decisions. Therefore, most users retain a fair
+ amount of spam in the event that they may wish to rebuild the corpus from
+ scratch.
+ 
+ Of course, this begs the question: "how much is enough?" That is where the
+ "art" of spambayes meets the science. Some users keep as many as several
+ thousand [recent] spam (as well as a similar number of ham). That is not to
+ say that you won't have excellent results with a tenth (or less) of that
+ number; since everyone's e-mail profile is different, the requirements for
+ training are as well.
+ 
+ 
  Why did Spambayes mark this obvious spam "unsure"?
  --------------------------------------------------





More information about the Spambayes-checkins mailing list