[Spambayes] Spam Prefiltering

David Abrahams dave at boost-consulting.com
Sat May 22 21:36:23 EDT 2004

"Michael C. Neel" <neel at mediapulse.com> writes:

> I'd recommend letting all mail goto the inbox, even better if the
> other filter is turned off completely.  Blacklisting is bad, FAQ and
> threads on this about, lol.

Mike, thanks for your reply.

I may be misreading you, and I mean no offense, but this response
seems a bit facile (lol).  Let me try to explain why:

AFAICT from the FAQ, blacklisting isn't "bad" per se.  The FAQ does
say that it's vulnerable to spoofing, but that doesn't seem like a big
problem as long as I have SpamBayes cleaning up the dregs.  The FAQ
also says that "blacklisting is really a server side responsibility".
Well, that's where it's happening.  SB also happens to be running on
the server, but to get that working my scripts have to mail each
message to me again after classification so they can be processed
accordingly (Communigate Pro limitation).  If the blacklist can throw
out a bunch of spam without doing that, it would seem to impose a much
lighter load on the already-slow server.

> Then train on a balanced set of spam/ham (i do 100 ea), and only train the
> ones spambayes gets wrong.  

Do you mean that I should try classifying messages with an untrained
spambayes, and then only train on the ones it got wrong?  I was under
the impression that before training everything would be classified as
"unsure".  You know, I've been using SpamBayes successfully for quite
some time now; I only just found out that I'm automatically training
on a whole lot more spam than I thought because of the blacklist.

> Check on the database every now and then, and if the numbers get too
> far off balance

What numbers?  What does "off balance" mean?

> add some more spam/ham to the training to balance it out.
> SB currently handles an account I get over 400 spams a day, and with the
> above message I see maybe a few emails a day in suspects, and occasoinally
> my brother emails me an ebay link I have to get out of the spam folder, lol.

I've been doing that well with my setup already; I'm just trying to
figure out what to do with the blacklist info.

