[Spambayes] Can we give recovered non-spam more priority ?

Jeremy Hylton jeremy at alum.mit.edu
Fri Jun 27 18:30:07 EDT 2003


On Fri, 2003-06-27 at 12:59, Murnaghan, Tim wrote:
> My setup involves my work e-mail address having leaked out to spam.
> I get around 10 spams a day. The population of emails I get is around 95% internal and a few external. As the internal stuff comes directly on exchange that really makes a difference in headers.
> 
> When I trained SpamBayes on my recent spam and my (100% squeaky clean) Inbox it decided that everything external containing a URL is spam. That doesn't work for the 5% of external mails and even after I recover them it's still rating them around 67% spam. (example attached - ironically the email is from SpamCop which is reasonably respectable).
> 
> Alternatively can we get it to be less agressive on the fact that it's got external headers ?
> The scoring from my name and having a return path seems ridiculously high.

The general approach of spambayes is amenable to giving more or less
weight to certain kinds of evidence.  You either let it see the headers
or you don't.  One possibility is to configure your spambayes to ignore
some or all headers.

I expect you've got a training database with a lot more ham than spam. 
Another possibility is to train on a set of messages where the number of
hams and spams are closer to each other.

A third possibility is to keep training and wait a while.  Eventually,
you'll see enough non-spam from your external address that it will
outweigh the header evidence.

Jeremy





More information about the Spambayes mailing list