[Spambayes] Whitelisting

Meyer, Tony T.A.Meyer at massey.ac.nz
Sat Aug 2 00:33:25 EDT 2003


> I've followed the debate about whitelisting, but would like to add my few
> pence.  I agree in principal that whitelisting should occur naturally as a
> result of training, but in practice this is not working for me, and I
> suspect that others may see the same problem.
 
Out of interest, how poorly is it working?  (For example, what percentage of mails are false-positives or unsures, and how highly do they score?)
 
> I receive a number of mailings which contain a combination of what I term as
> ham and spam in the same mail, e.g. an amount of ham content supported by a
> lot of spam advertisement.  I refer particularly dictionary.com's "word of
> the day" mailing as this causes me the most trouble.
 
I get this mailing too, and spambayes has no trouble distinguishing it from the spam that I get.  Do you get some sort of spam that looks a *lot* like this?  If you look at the 'clues' for the message, does anything really stand out, or is there really an equal mix?
 
I understand the point about training on spam that contains ham words (and vice-versa), but this is really the way the magic works (otherwise it'd just be a rule based system, really).  The classifier should be able to figure out which words *never* occur in [h|sp]am and which ones are just weak indicators, and adjust accordingly.

> For this reason I would propose an "advanced user" whitelist (not available
> by default perhaps? just in a text file?) which would allow me to completely
> ignore mails which I know contain conflicting ham/spam data.
 
I imagine that one of the main problems you will find in getting this sort of thing added is that the people doing the developing (including me!) dislike whitelists and believe that they do more harm than good (and find that spambayes classifies fine without them).  This means there really isn't any incentive to add the 'feature'.
 
Note, however, that (as it says in the FAQ), some of the commercial offshoots of spambayes (InBoxer, SpamAtBay) include whitelisting, so perhaps you should give their trial a go and see if that's what you are after?  (As a cheaper option, you could spend a day learning enough Python to add this feature to your own copy <wink>).
 
I believe that Sean is also working on some sort of grand super master filtering system that would be able to integrate different filters, including a whitelist and spambayes.  You could probably also use Outlook's rules to move 'whitelisted' mail out of the reach of spambayes, particularly using the timer option.
 
=Tony Meyer


More information about the Spambayes mailing list