[Tracker-discuss] spam auditor checked in

Erik Forsberg forsberg at efod.se
Wed Jul 25 17:28:56 CEST 2007


skip at pobox.com skrev:
>     Erik> *) An attribute, 'spambayes_score', is added to the file and msg
>     Erik> classes (in schema.py). Guess what this attribute will
>     Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should
>     Erik> also be added.
>
> When do you know it's been misclassified?  My thought would be that you have
> to save all submissionss which score as spam for some period of time,
> probably with some unique identifier (an incrementing counter would be
> sufficient).  That unique identifier has to propagate to the SpamBayes
> server.  Later on, if you determine that a submission was misclassifed, you
> use that unique id to retrieve the info you saved and pump it into the
> tracker.
>   
My idea was to set it to False for all file/msg instances that have been 
successfully classified, and then add a button that allows ordinary 
users to tag the file/msg as misclassified, which would allow a 
coordinator to visit the message and press either a 'mark as spam' or a 
'mark as ham' button. The former would set spambayes_score to 1.0 and 
submit the message for training as spam. The latter would set 
spambayes_score to 0.0 and submit the message for training as ham. Both 
would clear the spambayes__misclassified flag (set it to False).

Does this sound reasonable to you?

> I would hide all submissions which score as spam, whether anonymous or
> known.  Only admins should be able to see spam submissions.
>   
Yeah, that's probably the best way to do it.
>     Erik> This is quite a lot of work, of course, especially if you're new to 
>     Erik> roundup. Let me think about this to <zxsee if we can come up with 
>     Erik> something simpler.
>
> Yeah, that's pretty much beyond my capability.  I simply don't have the time
> to become a Roundup expert.
>   
Well, I'll see if I can find the time to do some of the work. Depends a 
bit on the weather.. :-).  I'll be very happy if you can contribute with 
some of your knowledge by inspecting my code and answer my questions.

It's been a while since I did anti-spam stuff. Fiddled a lot with SMTP 
filters and spamassassin some years ago. This feature wakes up some of 
the interest I had in the subject.

On the matter of training - will spambayes work best if it gets trained 
on about the same amount of spam messages as ham messages? That is, if 
we're training it on 5 spam messages, should we make sure we also train 
it on 5 ham messages?

Regards,
\EF


More information about the Tracker-discuss mailing list