[Spambayes] Setting the Spambayes timers

Tony Meyer tameyer at ihug.co.nz
Wed Aug 18 07:21:18 CEST 2004


> Can anyone advise me on the optimum settings for the 
> 'Processing Start Delay' and 'Delay between processing items' 
> settings

I believe optimum would very much depend on your specific setup (how fast
Outlook processes mail, basically).  The default settings are our best guess
for any given setup.

> or direct me to a website or document where these 
> settings are explained?

[I don't think this is anywhere on the website, so I'll do it here]

The purpose of the settings is to ensure that SpamBayes filters messages
*after* all Outlook rules have completed.  Without it SpamBayes might filter
before Outlook's rules, after Outlook's rules, or (and this is bad) during
Outlook's rules.  It's unfortunately impossible to tell with the information
Outlook makes available.  We can't ensure that we run before (which would be
nicer in some ways), but we can wait and be sure that we're after.

'Processing Start Delay' is therefore the amount of time that we wait after
Outlook says 'new mail has arrived in this folder you are watching' before
we filter that message.  This should be long enough for the Outlook rules to
process that message.

'Delay between processing items' is the amount of time that we wait after
filtering that first message, and before processing the next one (and so
on).  This should be long enough for the Outlook rules to process the next
message (etc).

> If I leave both values set to zero,

If you have them at zero, you might as well just turn background filtering
off.

> then anything 
> from 10-30% of the incoming spam ends up unscored, and 
> examination of the spambayes.log file shows that these items 
> were never 'seen' by the program.

These 10-30% of messages are the ones that get processed by Outlook's rules
before SpamBayes finds out about them.  The other 70-90% are ones that
SpamBayes finds out about first.  You'll find that this changes depending on
a whole lot of things (how busy the machine is with other things, for
example).

> (Curiously, such skipped 
> messages are always spam. I can't recall ever seen a good 
> message that was unrated.)

This is a co-incidence.

> I've experimented with different combinations of 0.5, 1 and 2 
> second settings. At best, this seems to let a lot of spam 
> into the inbox, then leave me waiting while Spambayes plods 
> down the listed mail and banishes the spam item by item. I 
> have yet to find settings which make sure that all the 
> incoming mail is 'seen' and analysed, and never gets into my inbox

SpamBayes doesn't touch the incoming mail process, so all mail will always
end up where Outlook delivers it (e.g. the Inbox) regardless of any
settings.  All you can change is how quickly SpamBayes finds out about it.
I use 2.0 and 1.0 (the defaults?) and that works fine for me - the messages
are in my Inbox for such a short time that I never really notice them.

Actually, I could turn off background filtering if it bothered me (but it
doesn't) since all but one of my rules runs on an Exchange server, and those
always get run before SpamBayes.

> For most of the time, the mail which is 'seen' is recognised 
> correctly. My normal mail tends to be technical in nature, 
> and experience has shown that I can set Spambayes to regard 
> messages with spam scores of more than 5% as being junk mail, 
> and anything above 0.1% as suspect. Even with such extreme 
> settings, the false positive rate is near zero. However, I 
> keep seeing some messages whose subject line is blatantly 
> 'spammy' being rated as 0% spam.

This is a completely different issue.  If you're not sure why a message is
scoring what it is, then the best way to figure it out is to select the
message, choose "Show spam clues for this message" from the SpamBayes menu
and look at the clues list.  You'll probably see why that is, but if not,
you can send it on to this list, and we'll try and explain it.

> This may be due to the 
> spammers' ingenuity in finding new ways to spell words such 
> as 'viagra'.

This is very unlikely.  Any word that SpamBayes hasn't seen before scores
0.5 and isn't used in the message score calculation.  If SpamBayes *has*
seen it before, I would think that it's much more likely to be a spam clue
than a ham one.  So these are all just ignored, and there should be plenty
of other clues in the message.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.



More information about the Spambayes mailing list