[Spambayes] separating training stuff from pop3proxy - how hard?

Skip Montanaro skip at pobox.com
Wed Jan 15 14:43:17 EST 2003

I'm sure others have considered this already, but I began wondering today
how hard it would be to separate pop3proxy into two pieces, the proxy stuff
and the training/web stuff.  I think having a separate training interface
would be good because it could then be used by other spambayes tools.

For example, just today I modified some Mailman-managed mailing lists to
pump incoming messages through "hammie.py -f" before passing along to

    export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt

    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
    | /usr/local/bin/stripmime.pl \
    | /home/mailman/mail/wrapper "$@"

(Please don't flog me for using stripmime.pl.  I'm sure there are better
MIME strippers out there, but it works fine for my needs. ;-)

For the time being I'm just using my own training database which is a
superset of what goes to that particular mailing list.

The "bright idea" I had today was that it would be great to simply modify
the above pipeline to

    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
    | tee /tmp/cedu-list-trainer \
    | /usr/local/bin/stripmime.pl \
    | /home/mailman/mail/wrapper "$@"

and have the training stuff from pop3proxy waiting on a Unix named pipe
named /tmp/cedu-list-trainer.  At my leisure I could then visit the web
interface and train any collected messages.

The "tee" command could be replaced by a simple little tee-like program
which disposed of the file in some other fashion, perhaps by using HTTP PUT
to toss it at the training server.

Any thoughts on this?  Richie?



