[Tracker-discuss] spam auditor checked in

skip at pobox.com skip at pobox.com
Wed Jul 25 15:20:32 CEST 2007


    >> Note that we switched from CVS to Subversion a couple days ago.  I
    >> don't think there are any significant differences yet (only my
    >> trivial test checkins), but you should track the Subversion
    >> repository.
    Erik> Ah. Good thing :-). http://spambayes.sourceforge.net/download.html
    Erik> needs an update, though.

Thanks, I'll fix that up.

    Erik> Now I think it needs training. Ideas on how to do that?
    >> 
    >> Yes, there are two ways to train.  First, there are train and
    >> train_mime methods in the XML-RPC server.  Second, and certainly more
    >> convenient to start with,
    Erik> I'm a programmer. For me, an xmlrpc interface is always more
    Erik> convenient than a web interface :-).

    >> point your web browser at the URL the server displays when it
    >> starts up, probably http://localhost:8880/.
    Erik> I got that running, yes. And I fully agree that it's better if the 
    Erik> spambayes server is running on localhost, as we don't want too many 
    Erik> external dependencies. As its now up and running on localhost, feel free 
    Erik> to turn off the instance on www.webfast.com.

    Erik> Also, I'm a bit confused on how the detector works - could you
    Erik> explain the arguments the XMLRPC method expects? Is the first
    Erik> argument supposed to be a string, or something else?
    >> 
    >> The score method takes three arguments, a dictionary representing the form
    >> submission contents, a possibly empty list of extra tokens which you
    >> generate, and a list of attachment dictionaries.  See the docstring for
    >> spambayes.XMLRPCPlugin.form_to_mime.
    >> 
    Erik> Ah! Now I understand how it works. I was looking in 
    Erik> scripts/sb_xmlrpcserver.py which is installed in the bin/ directory. I 
    Erik> should have been looking in XMLRPCPlugin.py. Is sb_xmlrpcserver.py 
    Erik> perhaps deprecated and on the list of things to be removed?

Yeah, it's kind of ancient.  I'm not aware of anyone who uses it these days.
It does have the advantage of being more lightweight than the core_server
(no web stuff).

    Erik> *) An attribute, 'spambayes_score', is added to the file and msg
    Erik> classes (in schema.py). Guess what this attribute will
    Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should
    Erik> also be added.

When do you know it's been misclassified?  My thought would be that you have
to save all submissionss which score as spam for some period of time,
probably with some unique identifier (an incrementing counter would be
sufficient).  That unique identifier has to propagate to the SpamBayes
server.  Later on, if you determine that a submission was misclassifed, you
use that unique id to retrieve the info you saved and pump it into the
tracker.

    Erik> *) A detector is added that reacts on instances of the file and
    Erik> msg classes. When it fires, it contacts the Spambayes XMLRPC
    Erik> Server and gets a score based on the contents and some syntetical
    Erik> tokens)

Yup.

    Erik> *) The web pages of the tracker should be modified to not display
    Erik> file and msg instances that are classified as spam for anonymous
    Erik> users.  Instead a message should be displayed that tells the user
    Erik> that the file or msg has been classified as spam, and that the
    Erik> user should login and press a button to alert an coordinator if
    Erik> the message is incorrectly classified.

I would hide all submissions which score as spam, whether anonymous or
known.  Only admins should be able to see spam submissions.

    ...

    Erik> This is quite a lot of work, of course, especially if you're new to 
    Erik> roundup. Let me think about this to <zxsee if we can come up with 
    Erik> something simpler.

Yeah, that's pretty much beyond my capability.  I simply don't have the time
to become a Roundup expert.

Skip


More information about the Tracker-discuss mailing list