GUI hacking for SA integration

older
Re: [Mailman-Developers] bugtraq...

Jeff Warnica

26 Nov 2003 26 Nov '03

9:33 p.m.

I have a couple of questions.

First: If I was to submit a (compleate) set of patches that allow at least some per list configuration for SpamAssassin integration, would they be accepted?

As a first attempt I would be just concerned with setting the score, and actions to take. Likely it would be hard coded at 3 possible actions discard, hold for moderation, or pass through, as I cant think that anyone would want to bounce spam.

And on doing the actual work: What version should I be working with? CVS or the 2.1 branch? What would be the closest thing that I could clone as a starting point? Topics seems like a possible thing to examin. Also, Im not a Python hacker, though Ive used a lot of different languages. Are there any paticular gotcha's of either Python or Mailman that I should be aware of?

Thanks.

Show replies by date

PieterB

27 Nov 27 Nov

9:47 a.m.

On Wed, Nov 26, 2003 at 04:33:45PM -0400, Jeff Warnica wrote:

...

First: If I was to submit a (compleate) set of patches that allow at least some per list configuration for SpamAssassin integration, would they be accepted?

I hope those patches will be accepted! Also see my patch mentioned in http://www.mail-archive.com/mailman-developers@python.org/msg06837.html It would be great if the MM 2.1.4 distribution will contain the files needed for spamassassin integration. It can be disabled by default, and some docs should be added of how to install it (or should it only be mentioned in the Mailan FAQ?).

I think mailman/sa integration should be considered in two places:

Enabling/disabling SA-integration and settings scores

I think Privacy Options | Spam Filters should have three more options:

Use SpamAssassin headers: yes/no

SpamAssassin Hold score: (default 5)

SpamAssassin Discard score: (default 10)

If the scores are left empty, the default score of the mailman-list should be used.

Moderating messages

Integrating moderation of spam messages that are holded in 'admindb'.

Jeff wrote:

...

Likely it would be hard coded at 3 possible actions discard, hold for moderation, or pass through, as I cant think that anyone would want to bounce spam.

I don't understand this. There are currently four options in 'admindb' moderation: defer/approve/reject/discard. It might the best to have a fifth option if spam assassin integration is enabled on the list: "discard spam". That would make it possible to integrate spam bayes learning or spam reporting on those messages (i can imaging that bouncing spam to a spamtrap is usefull in some setups as well). On the other hand it might be confusing for moderators to have two 'discard' options, and I doubt if many non-spam messages are discarded by moderators. Thoughts?

I'm willing to help you with the SA-integration. I didn't hear Barry's opinion on integrating SpamAssassin and mailman.

Regards,

Pieter

-- http://zwiki.org/PieterB

Barry Warsaw

4:46 p.m.

On Thu, 2003-11-27 at 03:47, PieterB wrote:

...

It would be great if the MM 2.1.4 distribution will contain the files needed for spamassassin integration.

We can't really add this to 2.1 because it would be a new feature.

...

I'm willing to help you with the SA-integration. I didn't hear Barry's opinion on integrating SpamAssassin and mailman.

For the next version of Mailman, I'd prefer to see something more generic if possible. That way a site could add SA, SB[1], or some other system. It may be too hard to do this since there are no standards here, but even then, I'd like to see something pluggable rather than tightly integrated.

-Barry

[1] I have a Spambayes patch on SF and I think Simone was working on that. I've been using the latest SB with my Evolution client and have been very impressed, although it does take a little bit of training.

Simone Piunno

5:24 p.m.

Alle 16:46, giovedì 27 novembre 2003, Barry Warsaw ha scritto:

...

For the next version of Mailman, I'd prefer to see something more generic if possible. That way a site could add SA, SB[1], or some other system. It may be too hard to do this since there are no standards here, but even then, I'd like to see something pluggable rather than tightly integrated.

I'd be happy to help coding this pluggable filter, but I'd need your help in the interface design. I believe the biggest problems are:

how do you plan to weight different filters? (e.g. using coefficients instead of a rigid pipeline, first-match-wins)
how to plug UI?

I'd like also to have a per-list FIFO queue of pristine copies of all messages received, with a configurable max-size, so that when a message is mis-categorized as either good or spam (not unsure) we have a chance to train on the pristine version (not decorated, not header-cooked, not scrubbed, and so on) through the admin interface.

Use case A:

Spammer sends a message to the list
Message gets mis-categorized as good and forwarded to the list, but a pristine copy is held in a special queue.
Admin notices the problem and opens the admin UI within a reasonble time
Admin can recover the pristine copy of that message from the special queue (selecting from a list of still-queued messages, or better by pasting some id copied from the message header as received from the list)
Admin can train one or all the filters on that pristine copy

Use case B:

Subscriber sends a message to the list
Message gets mis-categorized as spam and not sent to the list nor kept in the moderation queue, but a pristine copy is held in a special queue.
Admin is notified of the problem (by angry Subscriber) and opens the admin UI within a reasonble time
Admin can recover the pristine copy of that message from the special queue, selecting from a list of still-queued messages (no message header is available because the message was not forwarded)
Admin can train one or all the filters on that pristine copy and/or force re-processing of this message so that subscribers will receive it.

Use case C:

Someone sends a message to the list
Message gets categorized as unsure and held in the moderation queue, a pristine copy is held in a special queue anyway.
Admin is notified of the problem (by Mailman) and opens the admin UI within a reasonble time
Admin can see the message in the moderation queue and decide what to do, including training on one or all the filters.

Use case D:

Messages was categorized correctly or admin didn't try to react within the allowed timeframe, so the pristine copy is silently deleted from the special queue.

...

I've been using the latest SB with my Evolution client and have been very impressed, although it does take a little bit of training.

For focused traffic (such as on a list) training on 50 good and 50 spam is enough for > 99.9% success. At least this is my experience.

-- Simone Piunno, chief architect Wireless Solutions SPA - DADA group Europe HQ, via Castiglione 25 Bologna web:www.wseurope.com tel:+390512966811 fax:+390512966800 God is real, unless declared integer

Jeff Warnica

7:34 p.m.

Im not what you could consiter a power user as far as Mailman is concerned, but it is my understanding that there is more then one way for messages to get into the 'admindb' (if thats what the to-be-moderated queue is called). Being flaged as spam would be another way to enter that queue. How it gets out (or not...) is up to admindb, not to either the current SA 'Handlers', or my proposed Handler+Configurator.

Unless I wanted to hack up that subsystem so that the choices a moderator had were based on how the message got there, I guess its just a matter of having another canned error message. Its not much of a difference either way as far as code goes.

As for actual help, I think Im well on my way. Some of the options remain up to the site admin (spamd/headers, regexp), but on/off, the scores (reject, to-queue, member bonus) are now per-list configurable. Ive got to move it from my workstation over to a system that is (a bit more) 'live', and now add that other canned message.. But that might already be in the Handler... If I can figgure out how to get 'diff' to give me the new files and not just tell me about them, I could have a patchset up within a day.

On Thu, 2003-11-27 at 04:47, PieterB wrote:

...

Integrating moderation of spam messages that are holded in 'admindb'.

Jeff wrote:

...
Likely it would be hard coded at 3 possible actions discard, hold for moderation, or pass through, as I cant think that anyone would want to bounce spam.

I don't understand this. There are currently four options in 'admindb' moderation: defer/approve/reject/discard. It might the best to have a fifth option if spam assassin integration is enabled on the list: "discard spam". That would make it possible to integrate spam bayes learning or spam reporting on those messages (i can imaging that bouncing spam to a spamtrap is usefull in some setups as well). On the other hand it might be confusing for moderators to have two 'discard' options, and I doubt if many non-spam messages are discarded by moderators. Thoughts?

I'm willing to help you with the SA-integration. I didn't hear Barry's opinion on integrating SpamAssassin and mailman.

Barry Warsaw

30 Nov 30 Nov

11:05 p.m.

On Thu, 2003-11-27 at 13:34, Jeff Warnica wrote:

...

Unless I wanted to hack up that subsystem so that the choices a moderator had were based on how the message got there, I guess its just a matter of having another canned error message. Its not much of a difference either way as far as code goes.

Oh, let me add a few other things...

I would implement a Handler module with a fairly generic class in it. The class has a method that accepts a message, and returns the message with some spam header added. I believe SA has a client/server interface to it, and I know SB does, so this method would probably be fairly similar for both systems (specialize it by derivation and overriding of this method).

Then, in the handler module, you could instantiate zero or more instances of this class (say if you wanted both SB and SA) and define some ordering, probably via an mm_cfg variable. Now, the handler module exposes a process() method that's basically a shim to pass the message to the filter method of each of the above instantiated classes.

...

From here, the regexp matching bit I talked about earlier would simply match on the header added by SA or SB, and then take the action.

-Barry

7540

Age (days ago)

7544

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Barry Warsaw
Jeff Warnica
PieterB
Simone Piunno

GUI hacking for SA integration

Jeff Warnica

PieterB

Enabling/disabling SA-integration and settings scores

Moderating messages

Barry Warsaw

Simone Piunno

Jeff Warnica

Barry Warsaw

tags

participants (4)