[spambayes-dev] RE: [Spambayes] Are there plans for a daemonized or compiled versionofSpambayes?

Skip Montanaro skip at pobox.com
Mon Sep 22 12:08:09 EDT 2003


    Michael> I haven't looked in depth at the Spambayes code. But being the
    Michael> sysadmin I'm able to look at the processes running on the
    Michael> system. It appears that, when an email is scanned, multiple
    Michael> python threads get forked.  Presumably this is because
    Michael> "hammiefilter.py" runs other *.py scripts, or exec's multiple
    Michael> pythons. (True? Not true?)

Not true I don't think.

    Michael> Assuming that's what's happening, I guess I was wondering if it
    Michael> would be beneficial, in the sense of being less demanding on
    Michael> system resources, to consolidate all the routines into a single
    Michael> python thread? Is this feasible and worthwhile?

I tried it quite awhile ago, but didn't code the front-end client in C, just
Python.  One problem is that you substitute network overhead for startup
overhead.  Assuming you maintain the long-running process as a Python
program, you can try a couple things:

    1 write a front-end client in Python and use a very simple protocol to
    communicate with the server (maybe a byte count followed by the
    message).  The server would either spit back the message augmented with
    the usual scoring headers or just the score information, relying on the
    client to embellish the message.

    2 If (and only if) the above isn't fast enough, write the simplest
    front-end client you can in C to avoid Python startup overhead.

The first one will give you some idea what you're up against.  Python's
startup is probably the bottleneck, so I'm skeptical that the first option
will gain you anything besides an architecture which is simple to experiment
with.  The Python-based server scores messages very quickly once the startup
overhead is out of the way.

    Michael> Another thing I'm thinking about doing to mitigate the impact
    Michael> on resources, is running the hammiefilter in ramdisk.

It probably won't buy you much, but it's a simple enough thing to try.  Make
sure you copy your database (pickle or bsddb file) to ramdisk as well.

Skip



More information about the Spambayes mailing list