[spambayes-dev] A query on Frequently Asked Question 4.6
Harry Sigerson
harrysigerson at ntlworld.com
Sat Oct 30 15:47:17 CEST 2004
Dear SpamBayes,
A query on FAQ 4.6
================
4.6 Do I need to keep spam after it has been trained? If so, for how long?
Once a message has been [correctly] trained there is no need to keep it around.
However, SpamBayes' accuracy is dependent upon having a "sufficient" sample
from which to make its decisions. Therefore, most users retain a fair amount of
spam in the event that they may wish to rebuild the corpus from scratch.
Of course, this begs the question: "how much is enough?" That is where the
"art" of SpamBayes meets the science. Some users keep as many as several
thousand [recent] spam (as well as a similar number of ham). That is not to say
that you won't have excellent results with a tenth (or less) of that number;
since everyone's e-mail profile is different, the requirements for training are
as well.
================
The FAQ item 4.6 above addressed something that I've wondered about since
I got SB going around July this year, that is, "Where does all the 'saved' spam
go?" <smile>.
I do have a lot of spam now,
16,054 spams and 1,725 hams.
The thing is where do I go to cut down on the amount of spam?
I see the comment, "Warning: you have much more spam than ham - SpamBayes
works best with approximately even numbers of ham and spam." in the 'Status and
Configuration' section.
I'll have another look and see if I can figure out where all the spam is
located...
================
4.3 How do I train SpamBayes (forward/bounce method)?
Alternatively, when you receive an incorrectly classified message, you can
forward it to the SMTP proxy for training. If the message should have been
classified as spam, forward or bounce the message to spambayes_spam at localhost,
and if the message should have been classified as ham, forward it to
spambayes_ham at localhost. You can still review the training through the web
interface, if you wish to do so. You should ensure that the "lookup message in
cache" option is set to True/Yes before you use this.
Note that some mail clients (particularly Outlook Express) do not forward all
headers when you bounce, forward or redirect mail. We do not recommend using
the SMTP proxy with these clients.
===============
...I saw from that item 4.3 that there are locations at...
spambayes_spam at localhost
...and at...
spambayes_ham at localhost
...are these locations on this my computer or are they located at my ISP's
server?
This is a standalone machine with a cable-modem broadband connection to
NTL...
I've just used XP's Search facility to see if, spambayes_spam at localhost ,
exists as a file on this machine it didn't find such and I didn't expect that
it would.
Are these thousands of spam actually located on my ISP's server, that is
on NTL's server?
If so how do I go about accessing this that I might reduce the amount of
spam to something near the same 'ham' figure of 1,725?
Excuse the long explanation.
Regards,
Harry Sigerson.
More information about the spambayes-dev
mailing list