[Spambayes] RE: Ideas for an MSc project please...
cej at intech.com
Mon Feb 9 22:23:46 EST 2004
Ryan Malayter wrote:
> [Bob Coe]
>> Actually, Chris wasn't complaining about the unreliability of his
>> mail system; he was complaining about the impact on his throughput of
>> server-side filtering. And I think he has a point.
> So Chris knows that server-side Bayesian filtering is resource
> intensive, but still chooses to implement it on an old desktop from
> 1999. Rather than bemoan the resource intensive nature of filtering he
> chose to use, why not try some reasonably modern hardware for a critical
What an attitude! Frankly, we didn't really have the budget for
purchasing new hardware, especially when the existing hardware was
well-suited for the job. I've spent a considerable portion of time
slimming the whole system down, and it ran very well with a large margin
for periods of heavy stress.
> business function? Peak-demand planning is part of a systems
> administrators job.
True. See above note about budget. Dropping Exchange was like doubling
megahertz. Big improvement. I don't consider it a matter of vital
importance to be running the fastest turf-pounding, testosterone-pumped
machines available when they aren't needed.
> I don't try to run my database servers on
> five-year-old old desktops, because I *know* SQL servers require more
> horsepower when more than a few connections are in place.
Actually, SQL servers are *designed* to run on desktop machines. And
unless you're running MS SQL server or Oracle w/ a large database, SQL
servers run *fine* on a desktop. Done it, doing it, will do it in the
future. Look at what runs Slashdot -- their web server is a PIII 600
MHz box, and the database is a quad xeon 550. We hardly run a fraction
of their traffic. I think it is incompetent of an administrator to
demand truckloads of CPU when something smaller will be more
cost-effective, stabler, and do the job with plenty of room to spare.
> My point is this: spam is not going away soon, nor are viruses. Systems
> administrators have known this for several years. Planning and
> maintaining the infrastructure to deal with that is our job. Buying new
> hardware doesn't solve the majority of IT problems, but it could have
> solved this one for Chris had he planned well.
The mail server was handling the load fine with about 10-15 thousand
emails/day. It ran into trouble with 350 thousand. Proper resource
planning might have eased the problem, but would not have solved it.
Could I really justify spending $2,500 for a couple powerhouse mail
servers that would sit mostly idle when the existing mail server was
perfectly capable of handling 5 times the load (especially after we
dropped Exchange)? The spam load had been steadily increasing by about
200 messages/day since spring 2002 with surprising steadiness, and we
probably had 18 months to go before things might get tight wrt resources.
So it goes.
Spam filtering was the first thing to go when the load got heavy. And
it will continue to be the first thing to go, since it takes a lion's
share of resources. However, I think it could be done a little
differently, along these lines:
1) Primary mail server accepts email, does basic validity check (sender
2) Postfix passes the mail through a content_filter script
3) The content_filter script passes the mail through a *different*
machine, running spambayes on RPC (sorta, maybe. Still figuring this
out). If the filter server gets heavy, it'll start passing a percentage
of mail through without filtering. If the mail server machine doesn't
get a reply back quickly, it moves the mail without waiting for
4) The filter machine passes the classified email back to the
5) Postfix hands classified email to Cyrus for delivery.
Please don't cast stones quite so quickly. What we had was plenty good
enough for the job. I call that virus (we were not hit badly with
previous email virii) a "disaster," which is why we have "disaster
preparedness," i.e., SSH and the ability to strip things down to lean
mean fighting machine.
And we are now buying a couple 64 bit AMD 3 GHz machines w/ RAID1 hard
disks for just this sort of problem. . Cool, huh? (Don't think I
didn't ask for these, because I did. Quite a while ago.)
More information about the Spambayes