[Spambayes] separating training stuff from pop3proxy - how hard?

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Wed Jan 15 14:52:37 EST 2003


1/15/2003 2:43:17 PM, Skip Montanaro <skip at pobox.com> wrote:

>
>I'm sure others have considered this already, but I began wondering today
>how hard it would be to separate pop3proxy into two pieces, the proxy stuff
>and the training/web stuff.  I think having a separate training interface
>would be good because it could then be used by other spambayes tools.
>
>For example, just today I modified some Mailman-managed mailing lists to
>pump incoming messages through "hammie.py -f" before passing along to
>Mailman:
>
>    #!/bin/bash
>    BAYESHOME=/home/skip
>    export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt
>
>    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
>    | /usr/local/bin/stripmime.pl \
>    | /home/mailman/mail/wrapper "$@"
>
>(Please don't flog me for using stripmime.pl.  I'm sure there are better
>MIME strippers out there, but it works fine for my needs. ;-)
>
>For the time being I'm just using my own training database which is a
>superset of what goes to that particular mailing list.
>
>The "bright idea" I had today was that it would be great to simply modify
>the above pipeline to
>
>    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
>    | tee /tmp/cedu-list-trainer \
>    | /usr/local/bin/stripmime.pl \
>    | /home/mailman/mail/wrapper "$@"
>
>and have the training stuff from pop3proxy waiting on a Unix named pipe
>named /tmp/cedu-list-trainer.  At my leisure I could then visit the web
>interface and train any collected messages.
>
>The "tee" command could be replaced by a simple little tee-like program
>which disposed of the file in some other fashion, perhaps by using HTTP PUT
>to toss it at the training server.
>
>Any thoughts on this?  Richie?

The training stuff used by the pop3proxy is already 'stripped out' into 
Corpus.py and FileCorpus.py.  These modules probably don't do exactly what you 
need right now, but we've been considering rewriting them anyway, to handle 
more than just file system artifacts for messages.  You might take a look at 
those modules.  I have some ideas about rewriting them, Mark Hammond has 
levied some requirements as well...

>
>Thx,
>
>Skip
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes at python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org





More information about the Spambayes mailing list