[Tracker-discuss] spam auditor checked in
forsberg at efod.se
Wed Jul 25 17:28:56 CEST 2007
skip at pobox.com skrev:
> Erik> *) An attribute, 'spambayes_score', is added to the file and msg
> Erik> classes (in schema.py). Guess what this attribute will
> Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should
> Erik> also be added.
> When do you know it's been misclassified? My thought would be that you have
> to save all submissionss which score as spam for some period of time,
> probably with some unique identifier (an incrementing counter would be
> sufficient). That unique identifier has to propagate to the SpamBayes
> server. Later on, if you determine that a submission was misclassifed, you
> use that unique id to retrieve the info you saved and pump it into the
My idea was to set it to False for all file/msg instances that have been
successfully classified, and then add a button that allows ordinary
users to tag the file/msg as misclassified, which would allow a
coordinator to visit the message and press either a 'mark as spam' or a
'mark as ham' button. The former would set spambayes_score to 1.0 and
submit the message for training as spam. The latter would set
spambayes_score to 0.0 and submit the message for training as ham. Both
would clear the spambayes__misclassified flag (set it to False).
Does this sound reasonable to you?
> I would hide all submissions which score as spam, whether anonymous or
> known. Only admins should be able to see spam submissions.
Yeah, that's probably the best way to do it.
> Erik> This is quite a lot of work, of course, especially if you're new to
> Erik> roundup. Let me think about this to <zxsee if we can come up with
> Erik> something simpler.
> Yeah, that's pretty much beyond my capability. I simply don't have the time
> to become a Roundup expert.
Well, I'll see if I can find the time to do some of the work. Depends a
bit on the weather.. :-). I'll be very happy if you can contribute with
some of your knowledge by inspecting my code and answer my questions.
It's been a while since I did anti-spam stuff. Fiddled a lot with SMTP
filters and spamassassin some years ago. This feature wakes up some of
the interest I had in the subject.
On the matter of training - will spambayes work best if it gets trained
on about the same amount of spam messages as ham messages? That is, if
we're training it on 5 spam messages, should we make sure we also train
it on 5 ham messages?
More information about the Tracker-discuss