[spambayes-dev] A query on Frequently Asked Question 4.6

Harry Sigerson
Sat Oct 30 15:47:17 CEST 2004

Dear SpamBayes, 
     A query on FAQ 4.6 
4.6   Do I need to keep spam after it has been trained? If so, for how long?

Once a message has been [correctly] trained there is no need to keep it around. 
However, SpamBayes' accuracy is dependent upon having a "sufficient" sample 
from which to make its decisions. Therefore, most users retain a fair amount of 
spam in the event that they may wish to rebuild the corpus from scratch.

Of course, this begs the question: "how much is enough?" That is where the 
"art" of SpamBayes meets the science. Some users keep as many as several 
thousand [recent] spam (as well as a similar number of ham). That is not to say 
that you won't have excellent results with a tenth (or less) of that number; 
since everyone's e-mail profile is different, the requirements for training are 
as well.
     The FAQ item 4.6 above addressed something that I've wondered about since 
I got SB going around July this year, that is, "Where does all the 'saved' spam 
go?" <smile>. 
     I do have a lot of spam now, 
     16,054 spams and 1,725 hams. 
     The thing is where do I go to cut down on the amount of spam? 
     I see the comment, "Warning: you have much more spam than ham - SpamBayes 
works best with approximately even numbers of ham and spam." in the 'Status and 
Configuration' section.
     I'll have another look and see if I can figure out where all the spam is 

4.3   How do I train SpamBayes (forward/bounce method)?

Alternatively, when you receive an incorrectly classified message, you can 
forward it to the SMTP proxy for training. If the message should have been 
classified as spam, forward or bounce the message to spambayes_spam at localhost, 
and if the message should have been classified as ham, forward it to 
spambayes_ham at localhost. You can still review the training through the web 
interface, if you wish to do so. You should ensure that the "lookup message in 
cache" option is set to True/Yes before you use this.

Note that some mail clients (particularly Outlook Express) do not forward all 
headers when you bounce, forward or redirect mail. We do not recommend using 
the SMTP proxy with these clients.
     ...I saw from that item 4.3 that there are locations at...
     spambayes_spam at localhost
     ...and at...
     spambayes_ham at localhost
     ...are these locations on this my computer or are they located at my ISP's 
     This is a standalone machine with a cable-modem broadband connection to 
     I've just used XP's Search facility to see if,  spambayes_spam at localhost , 
exists as a file on this machine it didn't find such and I didn't expect that 
it would. 
     Are these thousands of spam actually located on my ISP's server, that is 
on NTL's server? 
     If so how do I go about accessing this that I might reduce the amount of 
spam to something near the same 'ham' figure of 1,725? 
     Excuse the long explanation. 
     Harry Sigerson. 

