[Spambayes] Any prospect of spambayes working with qmail?

Thu Feb 13 14:58:00 EST 2003

Skip Montanaro <skip at pobox.com> writes:

> I do something similar on a smaller scale on my mail server.  My
> wife's online interests are essentially a proper subset of mine, so I
> use the same training set for both of us.  I have her procmail setup
> direct marked-as-spam messages to me.  She gets everything else.  I've
> heard no complaints from her so far.  In fact, she doesn't even know I
> have things set up this way. ;-) She just gets a lot less spam.

What Skip is describing here is essentially what we're planning to
implement at $FIRM.  You buy $FIRM's firewall appliance, put it on your
network, and give it an address to send suspected spam.  That person
(the spammaster) goes through and weeds out any false positives,
re-sending them to a special address which then retrains on its mistake
and mails the original message to the original recipient.

The problem with this setup is that if it gets a false-negative, the
end-user must forward that message back to the classifier to be traned
as spam.  This is a really big problem, since by the time Outlook gets
its grubby hands on it, the original message is irreparably damaged.
You can still get a lot of useful information out of it in this mangled
state, but whether or not it's enough information remains to be seen.

You can probably set up a pretty good wordlist by training on a week's
worth of collected ham and spam--less if you're a bigger site.  But
unless you constantly retrain it, your accuracy will gradually degrade.
You have to keep retraining the classifier as your spam and ham change
in nature.

It's hard to make a learning classifier work for a big site, since by
its very nature it must get feedback on how it's doing, and most people
don't have the patience to train a mail filter--they just want to read
their email and get on with their lives.  So far, this ease-of-use
question has been answered by trying to integrate the filter into the
client.  A user probably won't mind (in fact, most of them probably
relish) hitting a scowling yellow face for "delete as spam".  A user
will probably be more reluctant to take the time to forward all their
spam to a special address.  This is why our focus has been on the client
and not the server, and this is why everyone keeps telling you to use
SpamAssassin (which only requires feedback in the form of installing the
latest version*).

But feel free to point out why I'm wrong.  I *want* to be wrong on this
one :)

Neale

* Okay, so SA has learning aspects to it, but you don't *have* to use
  them to get good results.  With SpamBayes, if you don't use the
  learning stuff, you get a useless filter.