[Spambayes] Randomized Spam Beating SpamBayes

Shawn K. Hall shawn at 12pointdesign.com
Sat Oct 21 10:46:44 CEST 2006


Thanks, Skip.

> "default_bayes_customize.ini".

It's in "C:\Program Files\SpamBayes\bin\"


> Once you find it, just add the options I mentioned
> to the [Tokenizer] section and restart.

Is there any means of directly testing that the settings applied are
actually taking effect?



> Were you the person with, like, 60,000 spams and a
> similar number of hams in your training set? Maybe
> try retraining from scratch.  I have a total of
> about 400 emails in my training set and it works
> fine.

Yes. I'm concerned about the volume of spam I might receive if I were to
try starting with a clean database. I get over 4,000 messages a day,
with well over half of that being spam that I receive with the express
purpose of analyzing spam to train my server to more efectively filter
it. Starting with a blank database, even if it were significantly
fine-tuned within the first day would leave literally thousands of spam
messages untrained in a single week. Right now I'm having about 20-25
spam messages make it to my inbox each day after training with the 60k
message ham+spam archives. At 4k messages per day and probably 2500-3000
of them being spam - 20-25 is at or less than 1% getting through. I can
live with that. It's far better than having a couple hundred per day. I
am considering backing up my current database and trying a new one (with
the current one available as a fallback).


On a very timely related note, the following article was publicized by
Frisk Software today:
  http://www.secureworks.com/analysis/spamthru/

It discusses the use of virus-infected botnets for spamming.

Regards,

Shawn K. Hall
http://12PointDesign.com/




More information about the SpamBayes mailing list