[Spambayes] separating training stuff from pop3proxy - how hard?

Skip Montanaro skip at pobox.com
Wed Jan 15 14:43:17 EST 2003


I'm sure others have considered this already, but I began wondering today
how hard it would be to separate pop3proxy into two pieces, the proxy stuff
and the training/web stuff.  I think having a separate training interface
would be good because it could then be used by other spambayes tools.

For example, just today I modified some Mailman-managed mailing lists to
pump incoming messages through "hammie.py -f" before passing along to
Mailman:

    #!/bin/bash
    BAYESHOME=/home/skip
    export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt

    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
    | /usr/local/bin/stripmime.pl \
    | /home/mailman/mail/wrapper "$@"

(Please don't flog me for using stripmime.pl.  I'm sure there are better
MIME strippers out there, but it works fine for my needs. ;-)

For the time being I'm just using my own training database which is a
superset of what goes to that particular mailing list.

The "bright idea" I had today was that it would be great to simply modify
the above pipeline to

    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
    | tee /tmp/cedu-list-trainer \
    | /usr/local/bin/stripmime.pl \
    | /home/mailman/mail/wrapper "$@"

and have the training stuff from pop3proxy waiting on a Unix named pipe
named /tmp/cedu-list-trainer.  At my leisure I could then visit the web
interface and train any collected messages.

The "tee" command could be replaced by a simple little tee-like program
which disposed of the file in some other fashion, perhaps by using HTTP PUT
to toss it at the training server.

Any thoughts on this?  Richie?

Thx,

Skip




More information about the Spambayes mailing list