[spambayes-dev] RE: [Spambayes] Are there plans for a daemonized
or compiled versionofSpambayes?
skip at pobox.com
Mon Sep 22 12:08:09 EDT 2003
Michael> I haven't looked in depth at the Spambayes code. But being the
Michael> sysadmin I'm able to look at the processes running on the
Michael> system. It appears that, when an email is scanned, multiple
Michael> python threads get forked. Presumably this is because
Michael> "hammiefilter.py" runs other *.py scripts, or exec's multiple
Michael> pythons. (True? Not true?)
Not true I don't think.
Michael> Assuming that's what's happening, I guess I was wondering if it
Michael> would be beneficial, in the sense of being less demanding on
Michael> system resources, to consolidate all the routines into a single
Michael> python thread? Is this feasible and worthwhile?
I tried it quite awhile ago, but didn't code the front-end client in C, just
Python. One problem is that you substitute network overhead for startup
overhead. Assuming you maintain the long-running process as a Python
program, you can try a couple things:
1 write a front-end client in Python and use a very simple protocol to
communicate with the server (maybe a byte count followed by the
message). The server would either spit back the message augmented with
the usual scoring headers or just the score information, relying on the
client to embellish the message.
2 If (and only if) the above isn't fast enough, write the simplest
front-end client you can in C to avoid Python startup overhead.
The first one will give you some idea what you're up against. Python's
startup is probably the bottleneck, so I'm skeptical that the first option
will gain you anything besides an architecture which is simple to experiment
with. The Python-based server scores messages very quickly once the startup
overhead is out of the way.
Michael> Another thing I'm thinking about doing to mitigate the impact
Michael> on resources, is running the hammiefilter in ramdisk.
It probably won't buy you much, but it's a simple enough thing to try. Make
sure you copy your database (pickle or bsddb file) to ramdisk as well.
More information about the Spambayes