[Spambayes] RE: Ideas for an MSc project please...

Christopher Jastram cej at intech.com
Mon Feb 9 22:23:46 EST 2004


Ryan Malayter wrote:

> [Bob Coe]
>  
>
>> Actually, Chris wasn't complaining about the unreliability of his 
>> mail system; he was complaining about the impact on his throughput of 
>> server-side filtering. And I think he has a point.   
>
>
> So Chris knows that server-side Bayesian filtering is resource
> intensive, but still chooses to implement it on an old desktop from
> 1999. Rather than bemoan the resource intensive nature of filtering he
> chose to use, why not try some reasonably modern hardware for a critical

What an attitude!  Frankly, we didn't really have the budget for 
purchasing new hardware, especially when the existing hardware was 
well-suited for the job.  I've spent a considerable portion of time 
slimming the whole system down, and it ran very well with a large margin 
for periods of heavy stress.

> business function? Peak-demand planning is part of a systems
> administrators job. 

True.  See above note about budget.  Dropping Exchange was like doubling 
megahertz.  Big improvement.  I don't consider it a matter of vital 
importance to be running the fastest turf-pounding, testosterone-pumped 
machines available when they aren't needed.

> I don't try to run my database servers on
> five-year-old old desktops, because I *know* SQL servers require more
> horsepower when more than a few connections are in place.

Actually, SQL servers are *designed* to run on desktop machines.  And 
unless you're running MS SQL server or Oracle w/ a large database, SQL 
servers run *fine* on a desktop.  Done it, doing it, will do it in the 
future.  Look at what runs Slashdot -- their web server is a PIII 600 
MHz box, and the database is a quad xeon 550.  We hardly run a fraction 
of their traffic.  I think it is incompetent of an administrator to 
demand truckloads of CPU when something smaller will be more 
cost-effective, stabler, and do the job with plenty of room to spare.

> My point is this: spam is not going away soon, nor are viruses. Systems
> administrators have known this for several years. Planning and
> maintaining the infrastructure to deal with that is our job. Buying new
> hardware doesn't solve the majority of  IT problems, but it could have
> solved this one for Chris had he planned well.

The mail server was handling the load fine with about 10-15 thousand 
emails/day.  It ran into trouble with 350 thousand.  Proper resource 
planning might have eased the problem, but would not have solved it.  
Could I really justify spending $2,500 for a couple powerhouse mail 
servers that would sit mostly idle when the existing mail server was 
perfectly capable of handling 5 times the load (especially after we 
dropped Exchange)?  The spam load had been steadily increasing by about 
200 messages/day since spring 2002 with surprising steadiness, and we 
probably had 18 months to go before things might get tight wrt resources.

So it goes.

Spam filtering was the first thing to go when the load got heavy.  And 
it will continue to be the first thing to go, since it takes a lion's 
share of resources.  However, I think it could be done a little 
differently, along these lines:

1) Primary mail server accepts email, does basic validity check (sender 
checks, etc)
2) Postfix passes the mail through a content_filter script
3) The content_filter script passes the mail through a *different* 
machine, running spambayes on RPC (sorta, maybe.  Still figuring this 
out).  If the filter server gets heavy, it'll start passing a percentage 
of mail through without filtering.  If the mail server machine doesn't 
get a reply back quickly, it moves the mail without waiting for 
classification.
4) The filter machine passes the classified email back to the 
content_filter script
5) Postfix hands classified email to Cyrus for delivery.

Please don't cast stones quite so quickly.  What we had was plenty good 
enough for the job.  I call that virus (we were not hit badly with 
previous email virii) a "disaster," which is why we have "disaster 
preparedness," i.e., SSH and the ability to strip things down to lean 
mean fighting machine.

And we are now buying a couple 64 bit AMD 3 GHz machines w/ RAID1 hard 
disks for just this sort of problem.  .  Cool, huh?  (Don't think I 
didn't ask for these, because I did.  Quite a while ago.)

Christopher Jastram



More information about the Spambayes mailing list