[Spambayes] Are there plans for a daemonized or compiled versionofSpambayes?

Martinez, Michael MMARTINEZ at CSREES.USDA.GOV
Mon Sep 22 08:57:25 EDT 2003


Let's discuss this a little.

I haven't looked in depth at the Spambayes code. But being the sysadmin
I'm able to look at the processes running on the system. It appears
that, when an email is scanned, multiple python threads get forked.
Presumably this is because "hammiefilter.py" runs other *.py scripts, or
exec's multiple pythons. (True? Not true?)

Assuming that's what's happening, I guess I was wondering if it would be
beneficial, in the sense of being less demanding on system resources, to
consolidate all the routines into a single python thread? Is this
feasible and worthwhile?

I'm harping on the multiple-thread issue, because, the thing that
happens with high email volume is that the number of python processes
grows exponentially.

Another thing I'm thinking about doing to mitigate the impact on
resources, is running the hammiefilter in ramdisk.

Suggestions are welcome.

Michael Martinez
Linux System Administrator
ISTM/CSREES
United States Department of Agriculture

-----Original Message-----
From: Tim Peters [mailto:tim.one at comcast.net] 
Sent: Saturday, September 20, 2003 6:36 PM
To: Martinez, Michael
Cc: spambayes at python.org; spambayes-dev at python.org
Subject: RE: [Spambayes] Are there plans for a daemonized or compiled
versionofSpambayes?

[Martinez, Michael]
> I've been running Spambayes on our agency Linux smtp gateway for
> several months and very happy with its classification of spam. My
> gateway is a qmail system and it pipes all incoming email through the
> hammiefilter prior to delivery.

Yup, running a distinct classifier for each email is a pretty crazy
design
for high-volume use.

> However, a performance problem arises when the gateway gets hit during
> peak hours with a lot of emails. What happens is the system slows down
> tremendously, in part due to the number of python instances that get
> forked in order to scan the emails.
>
> I was wondering: are there any plans to develop a lightweight,
> daemonized version of Spambayes?

The answer to that depends on you too:  what are your plans?  Python is
a C
program, and can be daemonized like any other.  Note the project's pspam
directory sets up a classifier backed by a ZODB database, which can be
attached to via opening a ZEO connection.  That would be a pleasant way
to
let multiple clients hook up at will to an always-running classifier.

> In the same vein, are there plans to port it to C or another compiled
> language?

AFAICT, the most expensive part of running spambayes now is running
Berkeley
database lookups, and the Sleepycat bsddb implementation is already
written
in C.  So profile before you presume to know what would help.  Based on
what
I've measured, my interest in recoding any of the rest in C is nil.

> How difficult would this be?

It would be extremely tedious.  You don't escape the needs for a
database,
for I/O, or for a variety of complex string-processing operations.  The
parts of the Python implementation that supply those to Python
programmers
are already coded in C, but much easier to use from Python than from C.




More information about the Spambayes mailing list