[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

Eric S. Raymond esr@thyrsus.com
Wed, 21 Aug 2002 02:22:26 -0400


Zack Weinberg <zack@codesourcery.com>:
> On Wed, Aug 21, 2002 at 01:35:56AM -0400, Eric S. Raymond wrote:
> > 
> > What I'm starting to test now is a refactoring of the program where it
> > spawn a daemon version of itself first time it's called.  The daemon
> > eats the wordlists and stays in core fielding requests from subsequent
> > program runs.  Basically an answer to "how you call bogofilter 1K
> > times a day from procmail without bringing your disks to their knees"
> > problem" -- persistence on the cheap.
> 
> For use at ISPs, the daemon should be able to field requests from lots
> of different users, maintaining one unified word list.  Without
> needing any access whatsoever to user home directories.

I'm on it.  The following is not yet working, but it's a straight road to get
there....

There is a public spam-checker port.  Your client program sends it
packets consisting of a list of header token counts.  You
can send lots of these blocks; each one has to be under the maximum
atomic-message size for sockets (I think that's 32K).  

The server accumulates the frequency counts you ship it until you say
"OK, what is it?"  Does the Bayes test.  Ships you back a result.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>