[Spambayes] Milter wrinkles

Thu Nov 14 03:19:56 2002

>>> "Stuart D. Gathman" wrote
> First, there is the difficulty of statistics being preferrably user
> specific.  Is this a show stopper for this kind of filtering at the milter
> level?  How could the system get feedback from the users?  Is this simply
> an inappropriate thing to do at this level?

It depends on how closely coupled your user's interests are. You will
need to do ham training on representative emails from all users - otherwise
you could end up, say, with one of your users being interested in playing
the bass guitar, and suddenly all of your users would be getting spam from
those steenking bass guitar manufacturers. I guess it would be possible to
have separate databases for each user, and use that when you get the
RCPT TO: header. 

> Second, a milter would like to hang up on spammers as soon as possible.  
> This is why a blacklist of spam domains is valuabl -  although it only 
> stops a small percentage, they are stopped immediately before many 
> resources are used.

No-one's really done that much work on this yet. I think GregW has the
python.org mailer set up so it grabs the entire message, checks it with
spamassassin, and if it's completely spammy, it produces an error and
drops the message. Greg can probably fill in more details here.

> I had the thought that the bayesian analysis could be applied to the 
> headers only.  Then, email with very spammy headers could be rejected 
> without bothering with the body.  I'll have to experiment with how 
> effective this is.

There's been some work in this area, but not an enormous amount. If you're
letting them get past the DATA SMTP command, you may as well pull down the
entire message rather than just the headers.

Anthony