[Tracker-discuss] spam auditor checked in
skip at pobox.com
skip at pobox.com
Wed Jul 25 15:20:32 CEST 2007
>> Note that we switched from CVS to Subversion a couple days ago. I
>> don't think there are any significant differences yet (only my
>> trivial test checkins), but you should track the Subversion
>> repository.
Erik> Ah. Good thing :-). http://spambayes.sourceforge.net/download.html
Erik> needs an update, though.
Thanks, I'll fix that up.
Erik> Now I think it needs training. Ideas on how to do that?
>>
>> Yes, there are two ways to train. First, there are train and
>> train_mime methods in the XML-RPC server. Second, and certainly more
>> convenient to start with,
Erik> I'm a programmer. For me, an xmlrpc interface is always more
Erik> convenient than a web interface :-).
>> point your web browser at the URL the server displays when it
>> starts up, probably http://localhost:8880/.
Erik> I got that running, yes. And I fully agree that it's better if the
Erik> spambayes server is running on localhost, as we don't want too many
Erik> external dependencies. As its now up and running on localhost, feel free
Erik> to turn off the instance on www.webfast.com.
Erik> Also, I'm a bit confused on how the detector works - could you
Erik> explain the arguments the XMLRPC method expects? Is the first
Erik> argument supposed to be a string, or something else?
>>
>> The score method takes three arguments, a dictionary representing the form
>> submission contents, a possibly empty list of extra tokens which you
>> generate, and a list of attachment dictionaries. See the docstring for
>> spambayes.XMLRPCPlugin.form_to_mime.
>>
Erik> Ah! Now I understand how it works. I was looking in
Erik> scripts/sb_xmlrpcserver.py which is installed in the bin/ directory. I
Erik> should have been looking in XMLRPCPlugin.py. Is sb_xmlrpcserver.py
Erik> perhaps deprecated and on the list of things to be removed?
Yeah, it's kind of ancient. I'm not aware of anyone who uses it these days.
It does have the advantage of being more lightweight than the core_server
(no web stuff).
Erik> *) An attribute, 'spambayes_score', is added to the file and msg
Erik> classes (in schema.py). Guess what this attribute will
Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should
Erik> also be added.
When do you know it's been misclassified? My thought would be that you have
to save all submissionss which score as spam for some period of time,
probably with some unique identifier (an incrementing counter would be
sufficient). That unique identifier has to propagate to the SpamBayes
server. Later on, if you determine that a submission was misclassifed, you
use that unique id to retrieve the info you saved and pump it into the
tracker.
Erik> *) A detector is added that reacts on instances of the file and
Erik> msg classes. When it fires, it contacts the Spambayes XMLRPC
Erik> Server and gets a score based on the contents and some syntetical
Erik> tokens)
Yup.
Erik> *) The web pages of the tracker should be modified to not display
Erik> file and msg instances that are classified as spam for anonymous
Erik> users. Instead a message should be displayed that tells the user
Erik> that the file or msg has been classified as spam, and that the
Erik> user should login and press a button to alert an coordinator if
Erik> the message is incorrectly classified.
I would hide all submissions which score as spam, whether anonymous or
known. Only admins should be able to see spam submissions.
...
Erik> This is quite a lot of work, of course, especially if you're new to
Erik> roundup. Let me think about this to <zxsee if we can come up with
Erik> something simpler.
Yeah, that's pretty much beyond my capability. I simply don't have the time
to become a Roundup expert.
Skip
More information about the Tracker-discuss
mailing list