[Tracker-discuss] spam auditor checked in

skip at pobox.com skip at pobox.com
Wed Jul 25 03:18:43 CEST 2007


    Erik> The xmlrpc server has been installed on psf.upfronthosting.co.za
    Erik> as detailed in your message, using a cvs checkout from an hour
    Erik> ago. Seems to work.

Note that we switched from CVS to Subversion a couple days ago.  I don't
think there are any significant differences yet (only my trivial test
checkins), but you should track the Subversion repository.  (I don't know
how to completely disable the CVS repository on SF.  Is it even possible?)

    Erik> Now I think it needs training. Ideas on how to do that?

Yes, there are two ways to train.  First, there are train and train_mime
methods in the XML-RPC server.  Second, and certainly more convenient to
start with, point your web browser at the URL the server displays when it
starts up, probably http://localhost:8880/.  (Try
http://www.webfast.com/sbmanage/ now.)  Your detector should probably be set
up to only reject submissions which score as spam.

    Erik> The server at www.webfast.com gives me an 404. 

Ah, yes, that wasn't running.  I've restarted it.  Note however that I was
unsuccessful getting the XML-RPC server running behind my Apache reverse
proxy.  My Apache chops are pretty rusty.  I was only working on getting the
server running on www.webfast.com because I didn't have direct access to the
tracker server.  If you can manage it, you'd be better off running it on the
same server as the tracker.  Only the web interface URL needs to be exposed
beyond the localhost (so the tracker admins can train the submissions).
That should be protected by Apache authentication.

    Erik> Also, I'm a bit confused on how the detector works - could you
    Erik> explain the arguments the XMLRPC method expects? Is the first
    Erik> argument supposed to be a string, or something else?

The score method takes three arguments, a dictionary representing the form
submission contents, a possibly empty list of extra tokens which you
generate, and a list of attachment dictionaries.  See the docstring for
spambayes.XMLRPCPlugin.form_to_mime.

I also put my test script on the webfast server:

    http://www.webfast.com/~skip/checkmimemsg.py

My intention is that file uploads are transferred in the attachments
dictionary as compound data while the normal form data are transferred in
the form dictionary.  The extra_tokens list should consist of synthetic
tokens your detector generates, such as "user:anonymous" or "user:skip" to
indicate the login status or "userage:N" where N is something like the log
of the number of seconds since the logged in user was registered.

One thing I'm unclear how to do is to recover from a submission which is
misclassified as spam.  You somehow need to recover the contents of that
form from somewhere and resubmit the contents.  I sort of think this has to
happen in the detector.

Skip



More information about the Tracker-discuss mailing list