[Spambayes] Randomized Spam Beating SpamBayes
Shawn K. Hall
shawn at 12pointdesign.com
Sat Oct 21 10:46:44 CEST 2006
Thanks, Skip.
> "default_bayes_customize.ini".
It's in "C:\Program Files\SpamBayes\bin\"
> Once you find it, just add the options I mentioned
> to the [Tokenizer] section and restart.
Is there any means of directly testing that the settings applied are
actually taking effect?
> Were you the person with, like, 60,000 spams and a
> similar number of hams in your training set? Maybe
> try retraining from scratch. I have a total of
> about 400 emails in my training set and it works
> fine.
Yes. I'm concerned about the volume of spam I might receive if I were to
try starting with a clean database. I get over 4,000 messages a day,
with well over half of that being spam that I receive with the express
purpose of analyzing spam to train my server to more efectively filter
it. Starting with a blank database, even if it were significantly
fine-tuned within the first day would leave literally thousands of spam
messages untrained in a single week. Right now I'm having about 20-25
spam messages make it to my inbox each day after training with the 60k
message ham+spam archives. At 4k messages per day and probably 2500-3000
of them being spam - 20-25 is at or less than 1% getting through. I can
live with that. It's far better than having a couple hundred per day. I
am considering backing up my current database and trying a new one (with
the current one available as a fallback).
On a very timely related note, the following article was publicized by
Frisk Software today:
http://www.secureworks.com/analysis/spamthru/
It discusses the use of virus-infected botnets for spamming.
Regards,
Shawn K. Hall
http://12PointDesign.com/
More information about the SpamBayes
mailing list