[Tracker-discuss] spam auditor checked in
forsberg at efod.se
Wed Jul 25 13:08:49 CEST 2007
skip at pobox.com skrev:
> Erik> The xmlrpc server has been installed on psf.upfronthosting.co.za
> Erik> as detailed in your message, using a cvs checkout from an hour
> Erik> ago. Seems to work.
> Note that we switched from CVS to Subversion a couple days ago. I don't
> think there are any significant differences yet (only my trivial test
> checkins), but you should track the Subversion repository.
Ah. Good thing :-). http://spambayes.sourceforge.net/download.html needs
an update, though.
> Erik> Now I think it needs training. Ideas on how to do that?
> Yes, there are two ways to train. First, there are train and train_mime
> methods in the XML-RPC server. Second, and certainly more convenient to
> start with,
I'm a programmer. For me, an xmlrpc interface is always more convenient
than a web interface :-).
> point your web browser at the URL the server displays when it
> starts up, probably http://localhost:8880/.
I got that running, yes. And I fully agree that it's better if the
spambayes server is running on localhost, as we don't want too many
external dependencies. As its now up and running on localhost, feel free
to turn off the instance on www.webfast.com.
> Erik> Also, I'm a bit confused on how the detector works - could you
> Erik> explain the arguments the XMLRPC method expects? Is the first
> Erik> argument supposed to be a string, or something else?
> The score method takes three arguments, a dictionary representing the form
> submission contents, a possibly empty list of extra tokens which you
> generate, and a list of attachment dictionaries. See the docstring for
Ah! Now I understand how it works. I was looking in
scripts/sb_xmlrpcserver.py which is installed in the bin/ directory. I
should have been looking in XMLRPCPlugin.py. Is sb_xmlrpcserver.py
perhaps deprecated and on the list of things to be removed?
> I also put my test script on the webfast server:
> My intention is that file uploads are transferred in the attachments
> dictionary as compound data while the normal form data are transferred in
> the form dictionary. The extra_tokens list should consist of synthetic
> tokens your detector generates, such as "user:anonymous" or "user:skip" to
> indicate the login status or "userage:N" where N is something like the log
> of the number of seconds since the logged in user was registered.
> One thing I'm unclear how to do is to recover from a submission which is
> misclassified as spam. You somehow need to recover the contents of that
> form from somewhere and resubmit the contents. I sort of think this has to
> happen in the detector.
Hmm.. In a complete system, I think it should work as follows:
*) An attribute, 'spambayes_score', is added to the file and msg classes
(in schema.py). Guess what this attribute will hold.. :-). A boolean
attribute 'spambayes_misclassified' should also be added.
*) A detector is added that reacts on instances of the file and msg
classes. When it fires, it contacts the Spambayes XMLRPC Server and
gets a score based on the contents and some syntetical tokens)
*) The web pages of the tracker should be modified to not display file
and msg instances that are classified as spam for anonymous users.
Instead a message should be displayed that tells the user that the file
or msg has been classified as spam, and that the user should login and
press a button to alert an coordinator if the message is incorrectly
*) The web pages should, for logged-in users, display a button that
allows ordinary users to alert administrators that a msg/file is
misclassified, by setting the 'spambayes_misclassified' attribute. A
detector should send mail to coordinators when this happens.
*) For coordinators, the web pages should provide buttons for "train as
ham" and "train as spam", and when one of these is pressed, the
'spambayes_misclassified' bool should be set to false. For the training
buttons to work, one or two new web actions are needed. They are written
as python scripts in the extensions directory of the tracker.
*) The detectors sending e-mail to various e-mail lists (and to the nosy
list) should not send mail when a message is classified as spam.
However, if a message was misclassified as spam, they should in an ideal
world re-send the message when the message is retrained as ham. The
latter might be tricky, though.
*) Issues that only have msg/file instances that are spam should
probably not be displayed in the tracker.
This is quite a lot of work, of course, especially if you're new to
roundup. Let me think about this to <zxsee if we can come up with
More information about the Tracker-discuss