[Spambayes] Win32 Command Line
tameyer at ihug.co.nz
Thu Oct 14 22:23:35 CEST 2004
> I compared the source code for sb_filter.py that
> I was running and the source code from the zip file and I see that
> somewhere along the line the line action(msg) was removed.
> Put it back in and we have PERFECTION!
> Now, my next task is to find a way to OPTIMIZE sb_filter so
> that it doesn't need to reload the database with every single
> execution. My goal is to put this thing on a server as a mail
> filter. I want to limit the overhead on each execution if possible.
> Any ideas on how to make the database portion stay resident
> and use it with calls to sb_filter?
This is the goal of sb_bnfilter/sb_bnserver (there's also a C version of one
of them, to optimise for speed, although it's not in any releases yet).
However, those scripts are definitely *nix only as they use unix domain
sockets (for the server & filter scripts to communicate). The scripts could
be adapted to work on Windows, but almost all of the code would need to
change, I think, as the scripts don't really do very much (the core
SpamBayes code does most of the work).
Alternatively, new Windows versions of the scripts (that used regular TCP/IP
sockets) could be created. However, this sounds much like sb_server &
sb_upload, which you could probably use instead (although you have just
spent all that time figuring out sb_filter...).
If you run sb_server without configuring it for any POP3/SMTP proxies, it
just runs the web interface (which you could ignore; it is accessible to any
other program on that machine, but not elsewhere (by default)). You can use
sb_upload (from CVS - the training switches are new) to train messages, much
like you use sb_filter.
That leaves classifying: you could probably still use sb_filter for this;
generally having two processes access the same database isn't good, but
since one is read only, and assuming that you'd have it set up so that it
wasn't ever training at the same time as it was classifying, it should be
ok, I think. Otherwise you'd need to patch sb_upload to be able to classify
(I could probably do this for you).
Have a think about what you'd like, and I'm happy to help out with coding
(as my time allows).
More information about the Spambayes